基于Concnext的基于音频反动体组织的神经网络

论文标题

基于Concnext的基于音频反动体组织的神经网络

ConvNeXt Based Neural Network for Audio Anti-Spoofing

论文作者

Ma, Qiaowei, Zhong, Jinghui, Yang, Yitao, Liu, Weiheng, Gao, Ying, Ng, Wing W. Y.

论文摘要

随着语音转换和语音合成算法的快速发展，自动说话者验证（ASV）系统容易受到欺骗攻击的影响。近年来，研究人员提出了许多基于手工制作的特征的反企业方法。但是，使用手工制作的功能而不是原始波形将丢失隐式信息以进行反欺骗。受到图像分类任务中Convnext有希望的表现的启发，我们修改了Convnext网络体系结构，并提出了轻巧的端到端反欺骗模型。通过与通道注意块集成并使用焦点损失函数，提出的模型可以集中在语音表示的最有用的子频段以及难以分类的困难样本上。实验表明，对于ASVSPOOF 2019 LA评估数据集，我们提出的系统可以达到0.64％的误差率为0.64％，最低-TDCF为0.0187，该数据集优于最先进的系统。

With the rapid development of speech conversion and speech synthesis algorithms, automatic speaker verification (ASV) systems are vulnerable to spoofing attacks. In recent years, researchers had proposed a number of anti-spoofing methods based on hand-crafted features. However, using hand-crafted features rather than raw waveform will lose implicit information for anti-spoofing. Inspired by the promising performance of ConvNeXt in image classification tasks, we revise the ConvNeXt network architecture and propose a lightweight end-to-end anti-spoofing model. By integrating with the channel attention block and using the focal loss function, the proposed model can focus on the most informative sub-bands of speech representations and the difficult samples that are hard to classify. Experiments show that our proposed system could achieve an equal error rate of 0.64% and min-tDCF of 0.0187 for the ASVSpoof 2019 LA evaluation dataset, which outperforms the state-of-the-art systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题