探索基于语音转换的数据增强依赖文本的扬声器验证

论文标题

探索基于语音转换的数据增强依赖文本的扬声器验证

Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification

论文作者

Qin, Xiaoyi, Yang, Yaogen, Yang, Lin, Wang, Xuyang, Wang, Junjie, Li, Ming

论文摘要

在本文中，我们专注于在有限培训数据的情况下提高与文本相关的说话者验证系统的性能。扬声器验证系统基于文本依赖的深度学习通常需要大规模的文本依赖性培训数据集，这可能是劳动力和成本昂贵的，尤其是对于定制的新唤醒单词。在最近的研究中，已经提出了可以产生高质量综合语音的语音转换系统。受这些作品的启发，我们采用了两种不同的语音转换方法以及非常简单的重新采样方法来生成新的与文本相关的语音样本，以进行数据增强目的。实验结果表明，在有限的训练数据的情况下，提出的方法将相等的差异稀有性能从6.51％提高到4.51％。

In this paper, we focus on improving the performance of the text-dependent speaker verification system in the scenario of limited training data. The speaker verification system deep learning based text-dependent generally needs a large scale text-dependent training data set which could be labor and cost expensive, especially for customized new wake-up words. In recent studies, voice conversion systems that can generate high quality synthesized speech of seen and unseen speakers have been proposed. Inspired by those works, we adopt two different voice conversion methods as well as the very simple re-sampling approach to generate new text-dependent speech samples for data augmentation purposes. Experimental results show that the proposed method significantly improves the Equal Error Rare performance from 6.51% to 4.51% in the scenario of limited training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题