音乐到文本的连接症：从音乐录音中生成描述性文本

论文标题

音乐到文本的连接症：从音乐录音中生成描述性文本

Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings

论文作者

Kuang, Zhihuan, Zong, Shi, Zhang, Jianbing, Chen, Jiajun, Liu, Hongfu

论文摘要

在本文中，我们考虑了一个新的研究问题：音乐到文本的连接性。与将音乐录制为预定义的类别分类的古典音乐标记问题不同，音乐到文本的连接性旨在从具有相同情感的音乐录音中产生描述性文本，以进一步理解。由于现有的与音乐相关的数据集不包含音乐录音上的语义描述，因此我们收集了一个新数据集，其中包含1,955对古典音乐录音和文本描述。基于此，我们构建了一个计算模型来生成可以描述音乐录制内容的句子。为了解决高度非歧视性的古典音乐，我们设计了一个群体拓扑保护损失，该损失将更多的样本视为小组参考，并保留了不同样本之间的相对拓扑。广泛的实验结果定性和定量证明了我们提出的模型在五个启发式方法或预训练的竞争方法及其在收集到的数据集中的变体的有效性。

In this paper, we consider a novel research problem: music-to-text synaesthesia. Different from the classical music tagging problem that classifies a music recording into pre-defined categories, music-to-text synaesthesia aims to generate descriptive texts from music recordings with the same sentiment for further understanding. As existing music-related datasets do not contain the semantic descriptions on music recordings, we collect a new dataset that contains 1,955 aligned pairs of classical music recordings and text descriptions. Based on this, we build a computational model to generate sentences that can describe the content of the music recording. To tackle the highly non-discriminative classical music, we design a group topology-preservation loss, which considers more samples as a group reference and preserves the relative topology among different samples. Extensive experimental results qualitatively and quantitatively demonstrate the effectiveness of our proposed model over five heuristics or pre-trained competitive methods and their variants on our collected dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题