VoiceFixer：高保真语音恢复的统一框架

论文标题

VoiceFixer：高保真语音恢复的统一框架

VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration

论文作者

Liu, Haohe, Liu, Xubo, Kong, Qiuqiang, Tian, Qiao, Zhao, Yan, Wang, DeLiang, Huang, Chuanzeng, Wang, Yuxuan

论文摘要

语音恢复旨在消除语音信号中的扭曲。先前的方法主要集中在单一类型的失真上，例如语音降解或覆盖。但是，在现实世界中，同时几种不同的扭曲可能会降低语音信号。因此，重要的是要扩展语音恢复模型以处理多种扭曲。在本文中，我们介绍了VoiceFixer，这是一个统一的高保真语音恢复框架。 VoiceFixer从多种扭曲（例如，噪音，混响和剪辑）中恢复语音，并可以扩大较低带宽的语音（例如，嘈杂的语音）到44.1 kHz的高频带高保真演讲。我们基于（1）一个分析阶段设计语音固定器，该分析阶段可预测降级语音中的中级特征，以及（2）合成阶段，该阶段使用神经vocoder生成波形。客观评估和主观评估都表明，语音装置对严重退化的语音有效，例如现实世界的历史语音记录。 VoiceFixer的样本可在https://haoheliu.github.io/voicefixer上找到。

Speech restoration aims to remove distortions in speech signals. Prior methods mainly focus on a single type of distortion, such as speech denoising or dereverberation. However, speech signals can be degraded by several different distortions simultaneously in the real world. It is thus important to extend speech restoration models to deal with multiple distortions. In this paper, we introduce VoiceFixer, a unified framework for high-fidelity speech restoration. VoiceFixer restores speech from multiple distortions (e.g., noise, reverberation, and clipping) and can expand degraded speech (e.g., noisy speech) with a low bandwidth to 44.1 kHz full-bandwidth high-fidelity speech. We design VoiceFixer based on (1) an analysis stage that predicts intermediate-level features from the degraded speech, and (2) a synthesis stage that generates waveform using a neural vocoder. Both objective and subjective evaluations show that VoiceFixer is effective on severely degraded speech, such as real-world historical speech recordings. Samples of VoiceFixer are available at https://haoheliu.github.io/voicefixer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题