论文标题

通过学习跨渠道功能以及多头关注来改善双微粒语音的演讲增强

Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel Features with Multi-Head Attention

论文作者

Xu, Xinmeng, Gu, Rongzhi, Zou, Yuexian

论文摘要

手工制作的空间特征,例如通道间强度差异(IID)和通道间相位差异(IPD),在最近基于深度学习的双微粒语音语音增强(DMSE)系统中起着基本作用。但是,在端到端的DMSE中,很难学习人为设计的空间和光谱特征之间的相互关系。在这项工作中,提出了使用基于多头交叉注意的DMSE的新型DMSE架构。提出的MHCA-CRN模型包括用于保留通道内特征的通道编码结构,以及用于充分利用跨渠道特征的多头跨注意机制。此外,提出的方法专门为解码器提供了额外的SNR估计器,以在多任务学习框架下估算帧级SNR,这有望避免由端到端DMSE模块引起的语音失真。最后,采用光谱增益函数来进一步抑制非天然的残余噪声。实验结果表明,针对多个最新模型的模型表现出色。

Hand-crafted spatial features, such as inter-channel intensity difference (IID) and inter-channel phase difference (IPD), play a fundamental role in recent deep learning based dual-microphone speech enhancement (DMSE) systems. However, learning the mutual relationship between artificially designed spatial and spectral features is hard in the end-to-end DMSE. In this work, a novel architecture for DMSE using a multi-head cross-attention based convolutional recurrent network (MHCA-CRN) is presented. The proposed MHCA-CRN model includes a channel-wise encoding structure for preserving intra-channel features and a multi-head cross-attention mechanism for fully exploiting cross-channel features. In addition, the proposed approach specifically formulates the decoder with an extra SNR estimator to estimate frame-level SNR under a multi-task learning framework, which is expected to avoid speech distortion led by end-to-end DMSE module. Finally, a spectral gain function is adopted to further suppress the unnatural residual noise. Experiment results demonstrated superior performance of the proposed model against several state-of-the-art models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源