论文标题
代表选择性自distillation和WAV2VEC 2.0欺骗扬声器验证的功能探索
Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification
论文作者
论文摘要
文本到语音和语音转换研究正在不断改进,以至于它们可以与真正的人类言论几乎无法区分的综合语音。在这方面,出现了对策(CM)对自动扬声器验证(ASV)系统的合成语音攻击的重要性。尽管如此,大多数端到端欺骗检测网络都是黑盒系统,而找到伪像的有效表示的答案仍然是蒙上面纱的。在本文中,我们检查了哪些特征空间可以使用WAV2VEC 2.0有效地表示合成伪像,并研究哪些体系结构可以有效地利用该空间。我们的研究使我们能够分析哪些语音信号的属性对于CM系统有利。提出的CM系统在ASVSPOOF 2019 LA评估设置上达到了0.31%的误差率(EER)。我们进一步提出了一种简单而有效的欺骗扬声器验证(SASV)方法,该方法利用了我们的对策系统中的删除表示形式。 SASV挑战2022数据库进行的评估显示SASV EER的1.08%。定量分析表明,使用WAV2VEC 2.0的探索特征空间既欺骗CM和SASV。
Text-to-speech and voice conversion studies are constantly improving to the extent where they can produce synthetic speech almost indistinguishable from bona fide human speech. In this regard, the importance of countermeasures (CM) against synthetic voice attacks of the automatic speaker verification (ASV) systems emerges. Nonetheless, most end-to-end spoofing detection networks are black-box systems, and the answer to what is an effective representation for finding artifacts remains veiled. In this paper, we examine which feature space can effectively represent synthetic artifacts using wav2vec 2.0, and study which architecture can effectively utilize the space. Our study allows us to analyze which attribute of speech signals is advantageous for the CM systems. The proposed CM system achieved 0.31% equal error rate (EER) on ASVspoof 2019 LA evaluation set for the spoof detection task. We further propose a simple yet effective spoofing aware speaker verification (SASV) method, which takes advantage of the disentangled representations from our countermeasure system. Evaluation performed with the SASV Challenge 2022 database show 1.08% of SASV EER. Quantitative analysis shows that using the explored feature space of wav2vec 2.0 advantages both spoofing CM and SASV.