噪声令牌：学习环境吸引语音增强的神经噪声模板

论文标题

噪声令牌：学习环境吸引语音增强的神经噪声模板

Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement

论文作者

Li, Haoyu, Yamagishi, Junichi

论文摘要

近年来，随着深度神经网络（DNNS）的成功，言语增强（SE）取得了令人印象深刻的进步。但是，DNN方法通常无法很好地推广到未看到的环境噪声中，这些噪声未包括在培训中。为了解决这个问题，我们提出了“噪声令牌”（NTS），这是一组与SE系统共同训练的神经噪声模板。 NTS动态捕获环境变异性，从而使DNN模型能够处理各种环境以产生质量更高的STFT幅度。实验结果表明，使用NTS是一种有效的策略，可始终提高SE系统在不同DNN体系结构之间的概括能力。此外，我们研究了应用最先进的神经声码编码器来产生波形而不是传统的逆向STFT（ISTFT）。主观听力测试表明，通过MEL光谱校正和基于Vocoder的波形合成可以显着抑制残留噪声。

In recent years, speech enhancement (SE) has achieved impressive progress with the success of deep neural networks (DNNs). However, the DNN approach usually fails to generalize well to unseen environmental noise that is not included in the training. To address this problem, we propose "noise tokens" (NTs), which are a set of neural noise templates that are jointly trained with the SE system. NTs dynamically capture the environment variability and thus enable the DNN model to handle various environments to produce STFT magnitude with higher quality. Experimental results show that using NTs is an effective strategy that consistently improves the generalization ability of SE systems across different DNN architectures. Furthermore, we investigate applying a state-of-the-art neural vocoder to generate waveform instead of traditional inverse STFT (ISTFT). Subjective listening tests show the residual noise can be significantly suppressed through mel-spectrogram correction and vocoder-based waveform synthesis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题