基于ERB尺度的空间相干特征的多通道目标语音增强

论文标题

基于ERB尺度的空间相干特征的多通道目标语音增强

Multi-channel target speech enhancement based on ERB-scaled spatial coherence features

论文作者

Hsu, Yicheng, Lee, Yonghan, Bai, Mingsian R.

论文摘要

最近，基于深度学习的语音增强技术引起了大量的研究关注。如果利用了麦克风信号中的空间信息，则与单微米液系统相比，在某些不良声条件下，麦克风阵列在某些不良声条件下可能是有利的。但是，多通道语音增强通常是在短期傅立叶变换（STFT）域中执行的，这使得增强方法计算在计算上。为了解决这个问题，我们提出了一种新型的等效矩形带宽（ERB）尺寸的空间相干特征，该特征取决于两个ERB频段之间的目标扬声器活动。在混响环境中使用四微粒阵列进行的实验，涉及语音干扰，证明了所提出的系统的功效。这项研究还表明，经过ERB尺度空间特征训练的网络与阵列中麦克风的几何形状和数量的变化具有鲁棒性。

Recently, speech enhancement technologies that are based on deep learning have received considerable research attention. If the spatial information in microphone signals is exploited, microphone arrays can be advantageous under some adverse acoustic conditions compared with single-microphone systems. However, multichannel speech enhancement is often performed in the short-time Fourier transform (STFT) domain, which renders the enhancement approach computationally expensive. To remedy this problem, we propose a novel equivalent rectangular bandwidth (ERB)-scaled spatial coherence feature that is dependent on the target speaker activity between two ERB bands. Experiments conducted using a four-microphone array in a reverberant environment, which involved speech interference, demonstrated the efficacy of the proposed system. This study also demonstrated that a network that was trained with the ERB-scaled spatial feature was robust against variations in the geometry and number of the microphones in the array.

下载PDF全文

下载文献需遵守相关版权规定

论文标题