通过学习跨渠道功能以及多头关注来改善双微粒语音的演讲增强

论文标题

通过学习跨渠道功能以及多头关注来改善双微粒语音的演讲增强

Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel Features with Multi-Head Attention

论文作者

Xu, Xinmeng, Gu, Rongzhi, Zou, Yuexian

论文摘要

手工制作的空间特征，例如通道间强度差异（IID）和通道间相位差异（IPD），在最近基于深度学习的双微粒语音语音增强（DMSE）系统中起着基本作用。但是，在端到端的DMSE中，很难学习人为设计的空间和光谱特征之间的相互关系。在这项工作中，提出了使用基于多头交叉注意的DMSE的新型DMSE架构。提出的MHCA-CRN模型包括用于保留通道内特征的通道编码结构，以及用于充分利用跨渠道特征的多头跨注意机制。此外，提出的方法专门为解码器提供了额外的SNR估计器，以在多任务学习框架下估算帧级SNR，这有望避免由端到端DMSE模块引起的语音失真。最后，采用光谱增益函数来进一步抑制非天然的残余噪声。实验结果表明，针对多个最新模型的模型表现出色。

Hand-crafted spatial features, such as inter-channel intensity difference (IID) and inter-channel phase difference (IPD), play a fundamental role in recent deep learning based dual-microphone speech enhancement (DMSE) systems. However, learning the mutual relationship between artificially designed spatial and spectral features is hard in the end-to-end DMSE. In this work, a novel architecture for DMSE using a multi-head cross-attention based convolutional recurrent network (MHCA-CRN) is presented. The proposed MHCA-CRN model includes a channel-wise encoding structure for preserving intra-channel features and a multi-head cross-attention mechanism for fully exploiting cross-channel features. In addition, the proposed approach specifically formulates the decoder with an extra SNR estimator to estimate frame-level SNR under a multi-task learning framework, which is expected to avoid speech distortion led by end-to-end DMSE module. Finally, a spectral gain function is adopted to further suppress the unnatural residual noise. Experiment results demonstrated superior performance of the proposed model against several state-of-the-art models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题