ATST：与教师变压器的音频表示学习

论文标题

ATST：与教师变压器的音频表示学习

ATST: Audio Representation Learning with Teacher-Student Transformer

论文作者

Li, Xian, Li, Xiaofei

论文摘要

自我监督学习（SSL）从大量未标记的数据中学习知识，然后将知识转移到有限的标记数据的特定问题上。 SSL在各个领域都取得了有希望的结果。这项工作解决了细分级通用音频SSL的问题，并提出了一个新的基于变压器的教师学生SSL模型，名为ATST。在最近出现的教师基线方案上开发了变压器编码器，该方案在很大程度上提高了预训练的建模能力。此外，旨在充分利用变压器的能力的新策略旨在充分利用。已经进行了广泛的实验，并且提出的模型几乎在所有下游任务上实现了新的最新结果。

Self-supervised learning (SSL) learns knowledge from a large amount of unlabeled data, and then transfers the knowledge to a specific problem with a limited number of labeled data. SSL has achieved promising results in various domains. This work addresses the problem of segment-level general audio SSL, and proposes a new transformer-based teacher-student SSL model, named ATST. A transformer encoder is developed on a recently emerged teacher-student baseline scheme, which largely improves the modeling capability of pre-training. In addition, a new strategy for positive pair creation is designed to fully leverage the capability of transformer. Extensive experiments have been conducted, and the proposed model achieves the new state-of-the-art results on almost all of the downstream tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题