论文标题
使用VQ-VAE分析语音转换和代码切换合成
Analysis of Voice Conversion and Code-Switching Synthesis Using VQ-VAE
论文作者
论文摘要
本文介绍了通过在德语,法语,英语和意大利语中同时执行语音转换和语言代码转换来实现的语音综合质量的分析。在本文中,我们利用代表来自VQ-VAE的电话信息的VQ代码索引执行代码转换和VQ扬声器代码来在单个系统中使用Neural Vocoder执行语音转换。我们的分析研究了代码转换的几个方面,包括语言开关数和每个开关中涉及的单词数。我们发现,语音合成质量在增加语言中的语言开关数量并减少单词数量后降低。当说话者的原始语言与合成目标话语的语言不同时,我们还发现了一些跨语言转换语音转换时的重音转移的证据。我们从听力测试中介绍了结果,并讨论了评估语音综合中重音转移的固有困难。我们的工作突出了使用半监督的端到端系统(例如VQ-VAE)来处理多语言合成的一些局限性和优势。我们的工作提供了有关为什么多语言语音综合具有挑战性的洞察力,我们建议一些方向扩大该领域的工作。
This paper presents an analysis of speech synthesis quality achieved by simultaneously performing voice conversion and language code-switching using multilingual VQ-VAE speech synthesis in German, French, English and Italian. In this paper, we utilize VQ code indices representing phone information from VQ-VAE to perform code-switching and a VQ speaker code to perform voice conversion in a single system with a neural vocoder. Our analysis examines several aspects of code-switching including the number of language switches and the number of words involved in each switch. We found that speech synthesis quality degrades after increasing the number of language switches within an utterance and decreasing the number of words. We also found some evidence of accent transfer when performing voice conversion across languages as observed when a speaker's original language differs from the language of a synthetic target utterance. We present results from our listening tests and discuss the inherent difficulties of assessing accent transfer in speech synthesis. Our work highlights some of the limitations and strengths of using a semi-supervised end-to-end system like VQ-VAE for handling multilingual synthesis. Our work provides insight into why multilingual speech synthesis is challenging and we suggest some directions for expanding work in this area.