深层滤清器均衡器的低延迟单声道语音增强

论文标题

深层滤清器均衡器的低延迟单声道语音增强

Low-latency Monaural Speech Enhancement with Deep Filter-bank Equalizer

论文作者

Zheng, Chengshi, Liu, Wenzhe, Li, Andong, Ke, Yuxuan, Li, Xiaodong

论文摘要

非常希望，语音增强算法可以在许多应用程序（例如数字助听器，听觉上透明的听力设备和公共广播系统系统）中保持低潜伏期的延迟性能，这是非常可取的。为了提高传统的低延迟语音增强算法的性能，提出了深层滤清器均衡器（FBE）框架，该框架与基于深度学习的基于深度学习的缩短数字滤波器映射网络集成了基于深度学习的子带降低网络。在第一个网络中，深度学习模型接受了可控的小框架转换训练，以满足低延迟需求，即$ \ le $ 4 ms，以便获得（复杂）子带增益，这可以被视为每个帧中的自适应数字过滤器。在第二个网络中，为了减少潜伏期，这种自适应数字滤波器被基于深度学习的框架隐式缩短，然后将其应用于嘈杂的语音，以重建没有重叠ADD方法的增强语音。 WSJ0-SI84语料库的实验结果表明，只有4毫秒潜伏期的深FBE在PESQ，STOI和降低噪声量的指数方面，所提出的仅4 ms潜伏期的性能要比传统的低延迟语音增强算法好得多。

It is highly desirable that speech enhancement algorithms can achieve good performance while keeping low latency for many applications, such as digital hearing aids, acoustically transparent hearing devices, and public address systems. To improve the performance of traditional low-latency speech enhancement algorithms, a deep filter-bank equalizer (FBE) framework was proposed, which integrated a deep learning-based subband noise reduction network with a deep learning-based shortened digital filter mapping network. In the first network, a deep learning model was trained with a controllable small frame shift to satisfy the low-latency demand, i.e., $\le$ 4 ms, so as to obtain (complex) subband gains, which could be regarded as an adaptive digital filter in each frame. In the second network, to reduce the latency, this adaptive digital filter was implicitly shortened by a deep learning-based framework, and was then applied to noisy speech to reconstruct the enhanced speech without the overlap-add method. Experimental results on the WSJ0-SI84 corpus indicated that the proposed deep FBE with only 4-ms latency achieved much better performance than traditional low-latency speech enhancement algorithms in terms of the indices such as PESQ, STOI, and the amount of noise reduction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题