自我传播：几乎没有自我训练的歌词转录

论文标题

自我传播：几乎没有自我训练的歌词转录

Self-Transriber: Few-shot Lyrics Transcription with Self-training

论文作者

Gao, Xiaoxue, Yue, Xianghu, Li, Haizhou

论文摘要

当前的歌词转录方法在很大程度上依赖于带有标记数据的监督学习，但是此类数据稀缺，而歌手的手动标记很昂贵。如何从未标记的数据中受益并减轻有限的数据问题尚未探索歌词转录。我们提出了第一个半监督的歌词转录范式，即自转录，通过使用嘈杂的学生增强来利用自我培训来利用未标记的数据。我们试图用一些标记的数据来证明歌词转录的可能性。自我转移者使用教师模型生成了未标记的唱歌的伪标签，并将伪标签增强到标签的数据中，以进行学生模型更新，并具有自我培训和监督培训损失。这项工作缩小了监督和半监督学习之间的差距，并为歌词转录的几次学习打开门。我们的实验表明，与在149.1个小时的标记数据进行培训的有监督的方法中，我们的方法仅使用12.7小时的标记数据实现竞争性能。

The current lyrics transcription approaches heavily rely on supervised learning with labeled data, but such data are scarce and manual labeling of singing is expensive. How to benefit from unlabeled data and alleviate limited data problem have not been explored for lyrics transcription. We propose the first semi-supervised lyrics transcription paradigm, Self-Transcriber, by leveraging on unlabeled data using self-training with noisy student augmentation. We attempt to demonstrate the possibility of lyrics transcription with a few amount of labeled data. Self-Transcriber generates pseudo labels of the unlabeled singing using teacher model, and augments pseudo-labels to the labeled data for student model update with both self-training and supervised training losses. This work closes the gap between supervised and semi-supervised learning as well as opens doors for few-shot learning of lyrics transcription. Our experiments show that our approach using only 12.7 hours of labeled data achieves competitive performance compared with the supervised approaches trained on 149.1 hours of labeled data for lyrics transcription.

下载PDF全文

下载文献需遵守相关版权规定

论文标题