频率和多尺度的选择性内核注意说话者验证

论文标题

频率和多尺度的选择性内核注意说话者验证

Frequency and Multi-Scale Selective Kernel Attention for Speaker Verification

论文作者

Mun, Sung Hwan, Jung, Jee-weon, Han, Min Hyun, Kim, Nam Soo

论文摘要

大多数最新的最先进的说话者验证体系结构都采用了多尺度处理和频道注意机制。这些模型的卷积层通常具有固定的内核大小，例如3或5。在这项研究中，我们进一步促进了利用选择性核心注意（SKA）机制的研究线。 SKA机制允许每个卷积层以数据驱动的方式自适应地选择内核大小。它基于利用频率和通道域的注意机制。我们首先将现有的SKA模块应用于基线。然后，我们提出了两个SKA变体，其中第一个变体在ECAPA-TDNN模型的前面应用，另一个变体与RES2NET骨干块结合使用。通过广泛的实验，我们证明了我们提出的两个SKA变体始终提高性能，并在三种不同的评估方案中进行测试时是互补的。

The majority of recent state-of-the-art speaker verification architectures adopt multi-scale processing and frequency-channel attention mechanisms. Convolutional layers of these models typically have a fixed kernel size, e.g., 3 or 5. In this study, we further contribute to this line of research utilising a selective kernel attention (SKA) mechanism. The SKA mechanism allows each convolutional layer to adaptively select the kernel size in a data-driven fashion. It is based on an attention mechanism which exploits both frequency and channel domain. We first apply existing SKA module to our baseline. Then we propose two SKA variants where the first variant is applied in front of the ECAPA-TDNN model and the other is combined with the Res2net backbone block. Through extensive experiments, we demonstrate that our two proposed SKA variants consistently improves the performance and are complementary when tested on three different evaluation protocols.

下载PDF全文

下载文献需遵守相关版权规定

论文标题