语音SIMCLR：将对比度和重建目标结合起来，用于自我监督语音表示学习

论文标题

语音SIMCLR：将对比度和重建目标结合起来，用于自我监督语音表示学习

Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning

论文作者

Jiang, Dongwei, Li, Wubo, Cao, Miao, Zou, Wei, Li, Xiangang

论文摘要

自我监督的视觉预处理最近显示出显着的进展。在这些方法中，Simclr在Imagenet上的自我监督和半监督学习中大大提高了最新技术的状态。语音和视觉任务的输入特征表示既连续又是连续的，因此自然考虑在语音表示学习中应用类似的目标是很自然的。在本文中，我们提出了语音Simclr，这是一个新的自我监督目标，用于语音表示学习。在培训期间，语音SIMCLR应用于原始语音及其频谱图。它的目的是对比损失的组合，可以最大限度地提高潜在空间中不同增强样本与输入表示的重建损失之间的一致性。提出的方法在语音情绪识别和语音识别方面取得了竞争成果。

Self-supervised visual pretraining has shown significant progress recently. Among those methods, SimCLR greatly advanced the state of the art in self-supervised and semi-supervised learning on ImageNet. The input feature representations for speech and visual tasks are both continuous, so it is natural to consider applying similar objective on speech representation learning. In this paper, we propose Speech SimCLR, a new self-supervised objective for speech representation learning. During training, Speech SimCLR applies augmentation on raw speech and its spectrogram. Its objective is the combination of contrastive loss that maximizes agreement between differently augmented samples in the latent space and reconstruction loss of input representation. The proposed method achieved competitive results on speech emotion recognition and speech recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题