深度有条件的表示鼓样品通过发声检索

论文标题

深度有条件的表示鼓样品通过发声检索

Deep Conditional Representation Learning for Drum Sample Retrieval by Vocalisation

论文作者

Delgado, Alejandro, Saitis, Charalampos, Benetos, Emmanouil, Sandler, Mark

论文摘要

用人类的声音模仿乐器是一种有效的方式，可以在音乐制作人之间传达思想，从素描旋律线到澄清所需的声音。因此，人们对建立应用程序的兴趣越来越多，这使艺术家能够通过声音模仿它们，从而有效地从大声音库中挑选目标样本。在这项研究中，我们研究了有条件的自动编码器模型的潜力，以学习通过发声（DSRV）学习鼓样品检索的信息特征。我们使用四个评估指标评估了它们的嵌入的有用性，其中两个相对于它们的声学特性，其中两个相对于他们的感知特性，通过人类听众的相似性等级。结果表明，在声音类型标签（鼓声）和鼓式标签（踢球vs vs vs vs vs vs noth hi hat vs open hi hat）上都以型号为条件的模型都学会了DSRV的最有用的嵌入方式。我们终于通过壁炉架测试研究了人声模仿样式的个体差异，并发现参与者之间的显着差异，强调了设计DSRV系统时用户信息的重要性。

Imitating musical instruments with the human voice is an efficient way of communicating ideas between music producers, from sketching melody lines to clarifying desired sonorities. For this reason, there is an increasing interest in building applications that allow artists to efficiently pick target samples from big sound libraries just by imitating them vocally. In this study, we investigated the potential of conditional autoencoder models to learn informative features for Drum Sample Retrieval by Vocalisation (DSRV). We assessed the usefulness of their embeddings using four evaluation metrics, two of them relative to their acoustic properties and two of them relative to their perceptual properties via human listeners' similarity ratings. Results suggest that models conditioned on both sound-type labels (drum vs imitation) and drum-type labels (kick vs snare vs closed hi-hat vs opened hi-hat) learn the most informative embeddings for DSRV. We finally looked into individual differences in vocal imitation style via the Mantel test and found salient differences among participants, highlighting the importance of user information when designing DSRV systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题