基于低资源TT的参考模型的半监督学习

论文标题

基于低资源TT的参考模型的半监督学习

Semi-Supervised Learning Based on Reference Model for Low-resource TTS

论文作者

Zhang, Xulong, Wang, Jianzong, Cheng, Ning, Xiao, Jing

论文摘要

大多数以前的神经文本到语音（TTS）方法主要基于监督的学习方法，这意味着它们取决于大型培训数据集，并且在低资源条件下很难实现可比的性能。为了解决这个问题，我们为神经TT提出了一种半监督的学习方法，其中标记的目标数据受到限制，这也可以解决以前的自动回归模型中的暴露偏见问题。具体而言，我们将对基于fastspeech2的参考模型进行了培训，并具有大量源数据，并在有限的目标数据集中进行了微调。同时，原始参考模型生成的伪标签用于进一步指导微型模型的训练，实现正则化效果，并减少对有限目标数据培训的微型模型的过度拟合。实验结果表明，我们提出的具有有限目标数据的半监督学习方案可显着提高测试数据的语音质量，以实现语音合成中的自然性和鲁棒性。

Most previous neural text-to-speech (TTS) methods are mainly based on supervised learning methods, which means they depend on a large training dataset and hard to achieve comparable performance under low-resource conditions. To address this issue, we propose a semi-supervised learning method for neural TTS in which labeled target data is limited, which can also resolve the problem of exposure bias in the previous auto-regressive models. Specifically, we pre-train the reference model based on Fastspeech2 with much source data, fine-tuned on a limited target dataset. Meanwhile, pseudo labels generated by the original reference model are used to guide the fine-tuned model's training further, achieve a regularization effect, and reduce the overfitting of the fine-tuned model during training on the limited target data. Experimental results show that our proposed semi-supervised learning scheme with limited target data significantly improves the voice quality for test data to achieve naturalness and robustness in speech synthesis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题