论文标题

最大声音频率估计:利用幅度和相光谱

Maximum Voiced Frequency Estimation: Exploiting Amplitude and Phase Spectra

论文作者

Drugman, Thomas, Stylianou, Yannis

论文摘要

在各种语音模型中,最大声音频率(MVF)用作光谱边界在发声声音生产过程中的周期性分离和大约组件。最近的研究表明,其适当的估计和建模可以增强统计参数语音合成器的质量。相反,已经报道了这些相同的MVF估计方法可以降低唱歌语音合成器的性能。本文提出了一种新的MVF估计方法,该方法利用了幅度和相光谱。结果表明,相位传达了有关语音信号谐波的相关信息,并且可以与振幅频谱得出的特征共同使用。该信息进一步集成到最大似然标准中,该标准提供了有关MVF估计的决定。将所提出的技术与两种最新方法进行了比较,并在客观和主观评估中表现出卓越的性能。感知测试表明高音声音的急剧改善。

Maximum Voiced Frequency (MVF) is used in various speech models as the spectral boundary separating periodic and aperiodic components during the production of voiced sounds. Recent studies have shown that its proper estimation and modeling enhance the quality of statistical parametric speech synthesizers. Contrastingly, these same methods of MVF estimation have been reported to degrade the performance of singing voice synthesizers. This paper proposes a new approach for MVF estimation which exploits both amplitude and phase spectra. It is shown that phase conveys relevant information about the harmonicity of the voice signal, and that it can be jointly used with features derived from the amplitude spectrum. This information is further integrated into a maximum likelihood criterion which provides a decision about the MVF estimate. The proposed technique is compared to two state-of-the-art methods, and shows a superior performance in both objective and subjective evaluations. Perceptual tests indicate a drastic improvement in high-pitched voices.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源