深层表示学习语音增强方法使用$β$ -VAE

论文标题

深层表示学习语音增强方法使用$β$ -VAE

A deep representation learning speech enhancement method using $β$-VAE

论文作者

Xiang, Yang, Højvang, Jesper Lisby, Rasmussen, Morten Højfeldt, Christensen, Mads Græsbøll

论文摘要

在先前的工作中，我们提出了一种基于变异的自动编码器（VAE）贝叶斯排序训练语音增强（SE）方法（PVAE），该方法表明，深层表示学习（DRL）可以改善传统的基于深层神经网络（DNN）方法的SE性能。根据我们以前的工作，我们在本文中建议使用$β$ -VAE进一步提高PVAE的表示能力。更具体地说，我们的$β$ -VAE可以提高PVAE从观察到的信号中解散不同的潜在变量的能力，而不会在解散和信号重建之间进行权衡问题。以前的$β$ -VAE算法中，这种权衡问题广泛存在。与以前的$β$ -VAE算法不同，提出的$β$ -VAE策略也可以用于优化DNN的结构。这意味着所提出的方法不仅可以改善PVAE的SE性能，还可以减少PVAE训练参数的数量。实验结果表明，所提出的方法比PVAE可以获取更好的语音和噪声潜在表示。同时，它还获得了更高的规模不变的信号距离，语音质量和语音清晰度。

In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training speech enhancement (SE) method (PVAE) which indicated that the SE performance of the traditional deep neural network-based (DNN) method could be improved by deep representation learning (DRL). Based on our previous work, we in this paper propose to use $β$-VAE to further improve PVAE's ability of representation learning. More specifically, our $β$-VAE can improve PVAE's capacity of disentangling different latent variables from the observed signal without the trade-off problem between disentanglement and signal reconstruction. This trade-off problem widely exists in previous $β$-VAE algorithms. Unlike the previous $β$-VAE algorithms, the proposed $β$-VAE strategy can also be used to optimize the DNN's structure. This means that the proposed method can not only improve PVAE's SE performance but also reduce the number of PVAE training parameters. The experimental results show that the proposed method can acquire better speech and noise latent representation than PVAE. Meanwhile, it also obtains a higher scale-invariant signal-to-distortion ratio, speech quality, and speech intelligibility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题