论文标题
使用生成对抗网络进行语音活动检测的暂时意识的上下文建模
Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection
论文作者
论文摘要
本文提出了一个新颖的语音活动检测框架(SAD)。受到语音处理领域中多任务学习方法最近成功的启发,我们为SAD提出了一个新颖的联合学习框架。我们利用生成对抗网络自动学习损失函数,以与下一个音频段的框架语音/非语音分类的联合预测。为了利用输入信号中的时间关系,我们提出了一个时间歧视器,旨在确保预测的信号在时间上保持一致。我们在多个公共基准上评估了拟议的框架,包括Nist Opensat '17,Ami Meeting和Havic,在那里我们演示了其表现优于最先进的SAD方法的能力。此外,我们的跨数据库评估证明了跨不同语言,口音和声学环境的拟议方法的鲁棒性。
This paper presents a novel framework for Speech Activity Detection (SAD). Inspired by the recent success of multi-task learning approaches in the speech processing domain, we propose a novel joint learning framework for SAD. We utilise generative adversarial networks to automatically learn a loss function for joint prediction of the frame-wise speech/ non-speech classifications together with the next audio segment. In order to exploit the temporal relationships within the input signal, we propose a temporal discriminator which aims to ensure that the predicted signal is temporally consistent. We evaluate the proposed framework on multiple public benchmarks, including NIST OpenSAT' 17, AMI Meeting and HAVIC, where we demonstrate its capability to outperform state-of-the-art SAD approaches. Furthermore, our cross-database evaluations demonstrate the robustness of the proposed approach across different languages, accents, and acoustic environments.