分析说话者验证的嵌入提取器和后端语言和频道不匹配下的后端

论文标题

分析说话者验证的嵌入提取器和后端语言和频道不匹配下的后端

Analyzing speaker verification embedding extractors and back-ends under language and channel mismatch

论文作者

Silnova, Anna, Stafylakis, Themos, Mosner, Ladislav, Plchot, Oldrich, Rohdin, Johan, Matejka, Pavel, Burget, Lukas, Glembek, Ondrej, Brummer, Niko

论文摘要

在本文中，我们分析了说话者嵌入者的行为和性能以及域和语言不匹配下的后端评分模型。我们介绍了有关基于重新连接的扬声器嵌入体系结构的发现，并表明降低了时间速度的产量可以提高性能。然后，我们考虑PLDA后端，并展示小型扬声器子空间，依赖语言的PLDA混合物和nuisance-Attribute投影的组合如何对系统性能产生巨大影响。此外，我们提出了一种有效的评分和融合方式，最近显示出对扬声器验证任务表现良好的班级后logit向量。使用NIST SRE 2021设置进行实验。

In this paper, we analyze the behavior and performance of speaker embeddings and the back-end scoring model under domain and language mismatch. We present our findings regarding ResNet-based speaker embedding architectures and show that reduced temporal stride yields improved performance. We then consider a PLDA back-end and show how a combination of small speaker subspace, language-dependent PLDA mixture, and nuisance-attribute projection can have a drastic impact on the performance of the system. Besides, we present an efficient way of scoring and fusing class posterior logit vectors recently shown to perform well for speaker verification task. The experiments are performed using the NIST SRE 2021 setup.

下载PDF全文

下载文献需遵守相关版权规定

论文标题