论文标题
像专业人士一样说话:通过模仿专业播音员语音转换来增加语音清晰度
Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion
论文作者
论文摘要
在大多数实际情况下,公告系统必须在嘈杂的环境中传递语音消息,在该环境中,背景噪声无法取消。当地噪音降低了语音的清晰度,并增加了听众的听力工作,因此阻碍了公告系统的有效性。据报道,专业播音员的声音比嘈杂的环境中的非专家说话者更清晰,更全面。这一发现表明,语音清晰度可能与专业播音员的口语风格有关,专业播音员可以使用语音转换方法进行调整。在这个想法的激励下,本文提出了通过在非专业语音上应用语音转换方法来提高嘈杂环境中的语音清晰度。我们发现,专业的播音员和非专业演讲者被聚集在嵌入扬声器的不同群集中。这意味着语音可理解性可以作为说话者个性的独立特征来控制。为了检查在嘈杂环境中转换语音的优势,我们使用粉红色噪声掩盖的测试词在不同的SNR级别上进行了实验。客观和主观评估的结果证实,在低SNR条件下,转换后语音的语音清晰度高于原始语音的语音清晰度。
In most of practical scenarios, the announcement system must deliver speech messages in a noisy environment, in which the background noise cannot be cancelled out. The local noise reduces speech intelligibility and increases listening effort of the listener, hence hamper the effectiveness of announcement system. There has been reported that voices of professional announcers are clearer and more comprehensive than that of non-expert speakers in noisy environment. This finding suggests that the speech intelligibility might be related to the speaking style of professional announcer, which can be adapted using voice conversion method. Motivated by this idea, this paper proposes a speech intelligibility enhancement in noisy environment by applying voice conversion method on non-professional voice. We discovered that the professional announcers and non-professional speakers are clusterized into different clusters on the speaker embedding plane. This implies that the speech intelligibility can be controlled as an independent feature of speaker individuality. To examine the advantage of converted voice in noisy environment, we experimented using test words masked in pink noise at different SNR levels. The results of objective and subjective evaluations confirm that the speech intelligibility of converted voice is higher than that of original voice in low SNR conditions.