Ambisep：使用变压器网络的Ambisonic到bambisonic的回响分离

论文标题

Ambisep：使用变压器网络的Ambisonic到bambisonic的回响分离

AmbiSep: Ambisonic-to-Ambisonic Reverberant Speech Separation Using Transformer Networks

论文作者

Herzog, Adrian, Chetupalli, Srikanth Raj, Habets, Emanuël A. P.

论文摘要

考虑一个多通道的Ambisonic记录，其中包含几个混响的语音信号的混合物。从混合物中盲目地盲目地撤回与单个语音来源相对应的回响的Ambisonic信号是一项艰巨的任务，因为它需要估算每个源的多个信号通道。在这项工作中，我们提出了Ambisep，这是一种基于神经网络的深层平面波域掩盖方法来解决此任务。屏蔽网络在三路处理配置中使用了学习的功能表示和变压器。我们在空间化的WSJ0-2MIX数据集上训练并评估了所提出的网络体系结构，并表明该方法在盲验测试集上实现了多通道标度的信噪比提高了17.7 dB，同时保留了分离的声音的空间特性。

Consider a multichannel Ambisonic recording containing a mixture of several reverberant speech signals. Retreiving the reverberant Ambisonic signals corresponding to the individual speech sources blindly from the mixture is a challenging task as it requires to estimate multiple signal channels for each source. In this work, we propose AmbiSep, a deep neural network-based plane-wave domain masking approach to solve this task. The masking network uses learned feature representations and transformers in a triple-path processing configuration. We train and evaluate the proposed network architecture on a spatialized WSJ0-2mix dataset, and show that the method achieves a multichannel scale-invariant signal-to-distortion ratio improvement of 17.7 dB on the blind test set, while preserving the spatial characteristics of the separated sounds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题