论文标题
在空间不约束的麦克风阵列中的分布式语音分离
Distributed speech separation in spatially unconstrained microphone arrays
论文作者
论文摘要
与几位演讲者的语音分离是一项具有挑战性的任务,因为语音的非平稳性和干扰来源之间的强烈信号相似性。当前的最新解决方案可以使用精致的深度神经网络将不同来源很好地分开,这些神经网络非常繁琐。当有几个麦克风可用时,可以利用空间信息来设计更简单的算法以区分扬声器。我们提出了一种分布式算法,该算法可以在空间不受限制的麦克风阵列中处理空间信息。该算法依赖于可以利用分布式节点的信号多样性的卷积复发性神经网络。在会议室的典型情况下,该算法可以在第一步中捕获每个源的估计值,并在麦克风阵列上传播它,以便在第二步中提高分离性能。我们表明,当来源和节点的数量增加时,这种方法的性能甚至更好。我们还研究了不匹配在训练和测试条件之间的来源数量中的影响。
Speech separation with several speakers is a challenging task because of the non-stationarity of the speech and the strong signal similarity between interferent sources. Current state-of-the-art solutions can separate well the different sources using sophisticated deep neural networks which are very tedious to train. When several microphones are available, spatial information can be exploited to design much simpler algorithms to discriminate speakers. We propose a distributed algorithm that can process spatial information in a spatially unconstrained microphone array. The algorithm relies on a convolutional recurrent neural network that can exploit the signal diversity from the distributed nodes. In a typical case of a meeting room, this algorithm can capture an estimate of each source in a first step and propagate it over the microphone array in order to increase the separation performance in a second step. We show that this approach performs even better when the number of sources and nodes increases. We also study the influence of a mismatch in the number of sources between the training and testing conditions.