论文标题
使用自回旋预处理和跨语性电话了解建模的无监督子词建模
Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling
论文作者
论文摘要
这项研究涉及无监督的子字建模,即可以区分语言子单词单元的学习特征表示。提出的方法采用了两阶段的瓶颈功能(BNF)学习框架,由自回归预测编码(APC)作为前端和DNN-BNF模型作为后端。 APC预估计的功能设置为DNN-BNF模型的输入功能。语言不匹配的ASR系统用于为DNN-BNF模型培训提供跨语言电话标签。最后,将BNF提取为子词歧视特征表示。这项工作的第二个目的是研究我们方法对不同培训数据的有效性的鲁棒性。关于Libri-Light和Zerospeech 2017数据库的结果表明,APC在前端特征进行了预处理方面有效。我们的整个系统在两个数据库上都优于最新技术。荷兰ASR的英语数据跨语言标签优于普通话,与英语相比,荷兰语的相似性可能与荷兰人的相似性更大的相似性有关。当培训数据超过50小时时,我们的系统对培训数据量不太敏感。 APC预处理导致所需的培训材料从超过5,000小时减少到大约200小时,而性能降解很少。
This study addresses unsupervised subword modeling, i.e., learning feature representations that can distinguish subword units of a language. The proposed approach adopts a two-stage bottleneck feature (BNF) learning framework, consisting of autoregressive predictive coding (APC) as a front-end and a DNN-BNF model as a back-end. APC pretrained features are set as input features to a DNN-BNF model. A language-mismatched ASR system is used to provide cross-lingual phone labels for DNN-BNF model training. Finally, BNFs are extracted as the subword-discriminative feature representation. A second aim of this work is to investigate the robustness of our approach's effectiveness to different amounts of training data. The results on Libri-light and the ZeroSpeech 2017 databases show that APC is effective in front-end feature pretraining. Our whole system outperforms the state of the art on both databases. Cross-lingual phone labels for English data by a Dutch ASR outperform those by a Mandarin ASR, possibly linked to the larger similarity of Dutch compared to Mandarin with English. Our system is less sensitive to training data amount when the training data is over 50 hours. APC pretraining leads to a reduction of needed training material from over 5,000 hours to around 200 hours with little performance degradation.