多通道端到端神经网络，用于语音增强，来源定位和语音活动检测

论文标题

多通道端到端神经网络，用于语音增强，来源定位和语音活动检测

Multi-channel end-to-end neural network for speech enhancement, source localization, and voice activity detection

论文作者

Chen, Yuan, Hsu, Yicheng, Bai, Mingsian R.

论文摘要

数十年来，通过广泛的现实应用程序，语音增强和来源本地化一直是积极的研究。最近，深层复杂的卷积复发网络（DCCRN）为单通道系统带来了令人印象深刻的增强性能。在这项研究中，提出了一个由波束形成器和新型多通道DCCRN组成的神经波束形成器，以增强语音和来源定位。由多通道DCCRN估计的复合价值过滤器是波束形式的重量。此外，采用基于单阶段的学习程序来增强语音和来源本地化。由多通道DCCRN和辅助网络组成的提出的网络对声场进行建模，同时最大程度地减少无失真响应损失函数。仿真结果表明，所提出的神经波束形式可有效增强语音信号，并保存得很好。所提出的神经波束形式还提供源定位和语音活动检测（VAD）功能。

Speech enhancement and source localization has been active research for several decades with a wide range of real-world applications. Recently, the Deep Complex Convolution Recurrent network (DCCRN) has yielded impressive enhancement performance for single-channel systems. In this study, a neural beamformer consisting of a beamformer and a novel multi-channel DCCRN is proposed for speech enhancement and source localization. Complex-valued filters estimated by the multi-channel DCCRN serve as the weights of beamformer. In addition, a one-stage learning-based procedure is employed for speech enhancement and source localization. The proposed network composed of the multi-channel DCCRN and the auxiliary network models the sound field, while minimizing the distortionless response loss function. Simulation results show that the proposed neural beamformer is effective in enhancing speech signals, with speech quality well preserved. The proposed neural beamformer also provides source localization and voice activity detection (VAD) functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题