自我监督的学习环境感知音调韵律表示

论文标题

自我监督的学习环境感知音调韵律表示

Self-Supervised Learning of Context-Aware Pitch Prosody Representations

论文作者

Noufi, Camille, Verma, Prateek

论文摘要

在音乐和语音中，含义是在多个上下文中得出的。例如，可以通过简短的声音令牌和较长的时间窗口（例如整个录音）上的声音图案来推断影响。在这封信中，我们专注于从情境的这种二分法中推断出意义。我们展示了如何隐含从基本频率（$ f_0 $）中学到的短唱歌声线的上下文表示，从而用作下游音乐信息检索（MIR）任务的有意义的功能空间。我们提出了三个自我监督的深度学习范式，这些范式利用了这两个级别上下文的假牙ask学习来产生潜在的表示空间。我们通过将看不见的音高轮廓嵌入每个空间并进行下游分类任务来评估这些表示的有用性。我们的结果表明，与使用传统的统计轮廓功能相比，上下文表示可以增强下游分类多达15 \％。

In music and speech, meaning is derived at multiple levels of context. Affect, for example, can be inferred both by a short sound token and by sonic patterns over a longer temporal window such as an entire recording. In this letter, we focus on inferring meaning from this dichotomy of contexts. We show how contextual representations of short sung vocal lines can be implicitly learned from fundamental frequency ($F_0$) and thus be used as a meaningful feature space for downstream Music Information Retrieval (MIR) tasks. We propose three self-supervised deep learning paradigms which leverage pseudotask learning of these two levels of context to produce latent representation spaces. We evaluate the usefulness of these representations by embedding unseen pitch contours into each space and conducting downstream classification tasks. Our results show that contextual representation can enhance downstream classification by as much as 15\% as compared to using traditional statistical contour features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题