论文标题

部分可观测时空混沌系统的无模型预测

TFCN: Temporal-Frequential Convolutional Network for Single-Channel Speech Enhancement

论文作者

Jia, Xupeng, Li, Dongmei

论文摘要

基于深度学习的单渠道语音增强试图训练神经网络模型以预测清洁语音信号。有多种流行的网络结构用于单渠道语音增强,例如TCNN,UNET,WaveNet等。但是,这些结构通常包含数百万个参数,这是移动应用程序的障碍。在这项工作中,我们提出了一个轻巧的神经网络,以提高语音增强,名为TFCN。它是一个时间频繁的卷积网络,该网络由扩张的卷积和深度分离的卷积构建。我们根据短期客观可理解性(Stoi),语音质量(PESQ)的感知评估以及一系列名为CSIG,CBAK和COVL的评估评估TFCN的性能。实验结果表明,与TCN和其他几种最先进的算法相比,所提出的结构仅使用93,000个参数实现了可比的性能。通过引入密集的连接和与正常的卷积,可以以更多的参数为代价来取得进一步的改进。实验还表明,所提出的结构在因果和非因果情况下都可以很好地工作。

Deep learning based single-channel speech enhancement tries to train a neural network model for the prediction of clean speech signal. There are a variety of popular network structures for single-channel speech enhancement, such as TCNN, UNet, WaveNet, etc. However, these structures usually contain millions of parameters, which is an obstacle for mobile applications. In this work, we proposed a light weight neural network for speech enhancement named TFCN. It is a temporal-frequential convolutional network constructed of dilated convolutions and depth-separable convolutions. We evaluate the performance of TFCN in terms of Short-Time Objective Intelligibility (STOI), perceptual evaluation of speech quality (PESQ) and a series of composite metrics named Csig, Cbak and Covl. Experimental results show that compared with TCN and several other state-of-the-art algorithms, the proposed structure achieves a comparable performance with only 93,000 parameters. Further improvement can be achieved at the cost of more parameters, by introducing dense connections and depth-separable convolutions with normal ones. Experiments also show that the proposed structure can work well both in causal and non-causal situations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源