论文标题
你戴口罩吗?使用循环一致的gans增强语音从语音中提高掩模检测
Are you wearing a mask? Improving mask detection from speech using augmentation by cycle-consistent GANs
论文作者
论文摘要
检测一个人是否从语音中戴上口罩的任务可用于在法医调查中建模语音,外科医生或保护自己免受Covid-19等传染病的沟通。在本文中,我们提出了一种新型的数据增强方法,以从语音中进行掩盖检测。我们的方法基于(i)训练生成的对抗网络(GAN),并具有周期矛盾的损失,以在两个类别(带有掩码和没有掩码的掩码)之间翻译不配合的话语,以及(ii)使用循环矛盾的gan产生新的训练说法,将相反的标签分配给每个翻译的话语。原始的和翻译的话语被转换为频谱图,这些频谱图被作为输入到具有不同深度的一组重新网络网络。通过支持向量机(SVM)分类器将网络合并为集合。有了这个系统,我们参加了Interspeech 2020计算副语言学挑战的面具子挑战(MSC),超过了组织者提出的基线2.8%。我们的数据增强技术在私人测试集中提供了0.9%的性能提升。此外,我们表明,与其他基线和最新的增强方法相比,我们的数据增强方法可以产生更好的结果。
The task of detecting whether a person wears a face mask from speech is useful in modelling speech in forensic investigations, communication between surgeons or people protecting themselves against infectious diseases such as COVID-19. In this paper, we propose a novel data augmentation approach for mask detection from speech. Our approach is based on (i) training Generative Adversarial Networks (GANs) with cycle-consistency loss to translate unpaired utterances between two classes (with mask and without mask), and on (ii) generating new training utterances using the cycle-consistent GANs, assigning opposite labels to each translated utterance. Original and translated utterances are converted into spectrograms which are provided as input to a set of ResNet neural networks with various depths. The networks are combined into an ensemble through a Support Vector Machines (SVM) classifier. With this system, we participated in the Mask Sub-Challenge (MSC) of the INTERSPEECH 2020 Computational Paralinguistics Challenge, surpassing the baseline proposed by the organizers by 2.8%. Our data augmentation technique provided a performance boost of 0.9% on the private test set. Furthermore, we show that our data augmentation approach yields better results than other baseline and state-of-the-art augmentation methods.