基于GE2E方法的独立扬声器验证的实证研究

论文标题

基于GE2E方法的独立扬声器验证的实证研究

An Empirical Study on Text-Independent Speaker Verification based on the GE2E Method

论文作者

Arasteh, Soroosh Tayebi

论文摘要

尽管说话者识别领域的许多研究人员已经开始使用深度学习技术替代以前的古典最先进方法，但在独立于文本的扬声器验证的背景下，一些传统的基于I-Vector的方法仍然是最新的。 Google使用长期短期记忆单元的一种基于深度学习的技术（GE2E）的通用端到端损失（GE2E）最近由于其收敛和概括而引起了很多关注。在这项研究中，我们旨在进一步研究GE2E方法并比较不同的情况，以研究其所有方面。本文讨论了各种实验，包括测试和入学话语的随机抽样，测试话语持续时间以及入学话语的数量。此外，我们将GE2E方法与基于文本独立的说话者验证的基线最先进的方法进行了比较，并表明它通过导致较低的错误率在端到端的情况下优于它们，并且需要减少收敛时间的培训时间。

While many researchers in the speaker recognition area have started to replace the former classical state-of-the-art methods with deep learning techniques, some of the traditional i-vector-based methods are still state-of-the-art in the context of text-independent speaker verification. Google's Generalized End-to-End Loss for Speaker Verification (GE2E), a deep learning-based technique using long short-term memory units, has recently gained a lot of attention due to its speed in convergence and generalization. In this study, we aim at further studying the GE2E method and comparing different scenarios in order to investigate all of its aspects. Various experiments including the effects of a random sampling of test and enrollment utterances, test utterance duration, and the number of enrollment utterances are discussed in this article. Furthermore, we compare the GE2E method with the baseline state-of-the-art i-vector-based methods for text-independent speaker verification and show that it outperforms them by resulting in lower error rates while being end-to-end and requiring less training time for convergence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题