在多语言文本到语音中应用功能未指定的词典语音特征

论文标题

在多语言文本到语音中应用功能未指定的词典语音特征

Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech

论文作者

Zhang, Cong, Zeng, Huinan, Liu, Huang, Zheng, Jiewen

论文摘要

这项研究研究了是否可以在文本到语音系统中应用源自特征未指定的词典模型的语音特征，以在英语和普通话中生成本地和非本地语音。我们将Arpabet/Pinyin映射到Sampa/Sampa-SC，然后向语音特征进行映射。测试了该映射是否可以导致两种语言中成功生成本地，非本地和代码开关的语音。我们进行了两个实验，一个实验具有一个小数据集，另一个具有较大的数据集。结果支持语音功能可以用作在火车数据中或不在火车数据中的语言中可行的输入系统，尽管需要进一步研究以提高模型性能。结果通过呈现成功合成的输出，以及在合成训练数据中不使用语言时带有源语言口音的输出来为FUL提供支持。 TTS过程刺激了人类的第二语言获取过程，因此也证实了FUL的计算能力。

This study investigates whether the phonological features derived from the Featurally Underspecified Lexicon model can be applied in text-to-speech systems to generate native and non-native speech in English and Mandarin. We present a mapping of ARPABET/pinyin to SAMPA/SAMPA-SC and then to phonological features. This mapping was tested for whether it could lead to the successful generation of native, non-native, and code-switched speech in the two languages. We ran two experiments, one with a small dataset and one with a larger dataset. The results supported that phonological features could be used as a feasible input system for languages in or not in the train data, although further investigation is needed to improve model performance. The results lend support to FUL by presenting successfully synthesised output, and by having the output carrying a source-language accent when synthesising a language not in the training data. The TTS process stimulated human second language acquisition process and thus also confirm FUL's ability to account for acquisition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题