Editnet：用于扬声器验证中无监督域改编的轻量级网络

论文标题

Editnet：用于扬声器验证中无监督域改编的轻量级网络

EDITnet: A Lightweight Network for Unsupervised Domain Adaptation in Speaker Verification

论文作者

Li, Jingyu, Liu, Wei, Lee, Tan

论文摘要

语言不匹配引起的性能降低是在用不同语言的语音数据应用扬声器验证系统时的常见问题。本文提出了一个名为Editnet的域传输网络，以减轻说话者嵌入语言不匹配的问题而无需扬声器标签。该网络利用条件变分的自动编码器将嵌入从目标域传输到源域。对传递的嵌入施加了自我监督的学习策略，以增加不同扬声器的嵌入之间的余弦距离。在编辑网络的训练过程中，嵌入式提取模型是固定的，而无需微调，从而使训练有效且低成本。在Voxceleb和CN-Celeb上进行的实验表明，Editnet传递的嵌入在ECAPA-TDNN512上超过了30％的未转移。通过其他嵌入提取模型，例如TDNN，SE-RESNET34，也可以提高性能。

Performance degradation caused by language mismatch is a common problem when applying a speaker verification system on speech data in different languages. This paper proposes a domain transfer network, named EDITnet, to alleviate the language-mismatch problem on speaker embeddings without requiring speaker labels. The network leverages a conditional variational auto-encoder to transfer embeddings from the target domain into the source domain. A self-supervised learning strategy is imposed on the transferred embeddings so as to increase the cosine distance between embeddings from different speakers. In the training process of the EDITnet, the embedding extraction model is fixed without fine-tuning, which renders the training efficient and low-cost. Experiments on Voxceleb and CN-Celeb show that the embeddings transferred by EDITnet outperform the un-transferred ones by around 30% with the ECAPA-TDNN512. Performance improvement can also be achieved with other embedding extraction models, e.g., TDNN, SE-ResNet34.

下载PDF全文

下载文献需遵守相关版权规定

论文标题