为什么对语音识别的自我监督学习有益于说话者的认可？

论文标题

为什么对语音识别的自我监督学习有益于说话者的认可？

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

论文作者

Chen, Sanyuan, Wu, Yu, Wang, Chengyi, Liu, Shujie, Chen, Zhuo, Wang, Peidong, Liu, Gang, Li, Jinyu, Wu, Jian, Yu, Xiangzhan, Wei, Furu

论文摘要

最近，即使预训练的目标是为语音识别设计的，自我监督学习（SSL）在说话者的识别方面表现出了强劲的表现。在本文中，我们研究了哪些因素会导致对与说话者相关的任务的自我监督学习的成功，例如扬声器验证（SV）通过一系列精心设计的实验。我们对Voxceleb-1数据集的经验结果表明，SSL对SV任务的好处是来自蒙版语音预测丢失，数据量表和模型大小的组合，而SSL量化器具有较小的影响。我们进一步采用了综合梯度归因方法和损失景观可视化，以了解说话者识别性能的自学学习的有效性。

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition. In this paper, we study which factor leads to the success of self-supervised learning on speaker-related tasks, e.g. speaker verification (SV), through a series of carefully designed experiments. Our empirical results on the Voxceleb-1 dataset suggest that the benefit of SSL to SV task is from a combination of mask speech prediction loss, data scale, and model size, while the SSL quantizer has a minor impact. We further employ the integrated gradients attribution method and loss landscape visualization to understand the effectiveness of self-supervised learning for speaker recognition performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题