改进的关系网络用于端到端扬声器验证和标识

论文标题

改进的关系网络用于端到端扬声器验证和标识

Improved Relation Networks for End-to-End Speaker Verification and Identification

论文作者

Chaubey, Ashutosh, Sinha, Sparsh, Ghose, Susmita

论文摘要

在现实世界中，扬声器标识系统的任务是在一组注册的扬声器中识别一套扬声器，只给出了每个注册扬声器的几个示例。本文展示了该用例的元学习和关系网络的有效性。我们提出了改进的关系网络，用于说话者验证和很少的演讲者识别。关系网络的使用促进了前端扬声器编码器和后端模型的联合培训。灵感来自于在扬声器验证中使用原型网络并增加说话者嵌入的可区分性，我们训练模型，以在训练集中存在的所有扬声器中对当前情节进行分类。此外，我们通过从给定的元学习插曲中提取更多信息，并提出一种新的培训制度，以使用可忽略不计的额外计算。我们在Voxceleb，Sitw和VCTK数据集上评估了有关说话者验证的任务和看不见的说话者识别的提议技术。所提出的方法在这两个任务上始终超过现有方法。

Speaker identification systems in a real-world scenario are tasked to identify a speaker amongst a set of enrolled speakers given just a few samples for each enrolled speaker. This paper demonstrates the effectiveness of meta-learning and relation networks for this use case. We propose improved relation networks for speaker verification and few-shot (unseen) speaker identification. The use of relation networks facilitates joint training of the frontend speaker encoder and the backend model. Inspired by the use of prototypical networks in speaker verification and to increase the discriminability of the speaker embeddings, we train the model to classify samples in the current episode amongst all speakers present in the training set. Furthermore, we propose a new training regime for faster model convergence by extracting more information from a given meta-learning episode with negligible extra computation. We evaluate the proposed techniques on VoxCeleb, SITW and VCTK datasets on the tasks of speaker verification and unseen speaker identification. The proposed approach outperforms the existing approaches consistently on both tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题