论文标题
通过三胞胎培训计划改善跨语性语音综合
Improving Cross-lingual Speech Synthesis with Triplet Training Scheme
论文作者
论文摘要
跨语言文本到语音(TTS)的最新进展使得用单语言者陌生的语言综合语音。但是,就自然性和清晰度而言,产生的跨语性语音的发音与母语者的发音之间存在很大差距。在本文中,提出了三胞胎训练方案,以通过允许以前看不见的内容和扬声器组合在训练期间看到跨语性发音。提出的方法在训练过程中引入了一个额外的微调阶段,并在训练过程中带有三重损失,这有效地绘制了综合的外国语音的发音,更接近本地锚式锚书的人,同时保留了非本地人说话者的音色。实验是根据最先进的基线跨语言TTS系统及其增强的变体进行的。所有客观和主观评估都表明,所提出的方法在合成的跨语性语音的清晰度和自然性方面都显着提高。
Recent advances in cross-lingual text-to-speech (TTS) made it possible to synthesize speech in a language foreign to a monolingual speaker. However, there is still a large gap between the pronunciation of generated cross-lingual speech and that of native speakers in terms of naturalness and intelligibility. In this paper, a triplet training scheme is proposed to enhance the cross-lingual pronunciation by allowing previously unseen content and speaker combinations to be seen during training. Proposed method introduces an extra fine-tune stage with triplet loss during training, which efficiently draws the pronunciation of the synthesized foreign speech closer to those from the native anchor speaker, while preserving the non-native speaker's timbre. Experiments are conducted based on a state-of-the-art baseline cross-lingual TTS system and its enhanced variants. All the objective and subjective evaluations show the proposed method brings significant improvement in both intelligibility and naturalness of the synthesized cross-lingual speech.