论文标题
Drumgan:使用生成对抗网络的音色功能调节的鼓声合成
DrumGAN: Synthesis of Drum Sounds With Timbral Feature Conditioning Using Generative Adversarial Networks
论文作者
论文摘要
鼓声的合成创建(例如,在鼓机中)通常是使用模拟或数字合成进行的,从而允许音乐家雕刻所需的音色修改各种参数。通常,此类参数控制声音的低级特征,并且通常没有音乐含义或感知对应关系。随着深度学习的兴起,数据驱动的音频处理成为传统信号处理的替代方案。这种新的范式允许通过学习的高级特征或在音乐相关信息上调节模型来控制综合过程。在本文中,我们将生成的对抗网络应用于鼓声的音频综合任务。通过根据具有公开功能提取器计算的感知功能调节模型,在生成过程中获得了直观控制。这些实验是在大量的踢脚,圈圈和cys声中进行的。我们表明,与基于U-NET体系结构的特定先前工作相比,我们的方法大大提高了生成的鼓样本的质量,并且条件输入确实塑造了声音的感知特征。此外,我们提供音频示例并发布实验中使用的代码。
Synthetic creation of drum sounds (e.g., in drum machines) is commonly performed using analog or digital synthesis, allowing a musician to sculpt the desired timbre modifying various parameters. Typically, such parameters control low-level features of the sound and often have no musical meaning or perceptual correspondence. With the rise of Deep Learning, data-driven processing of audio emerges as an alternative to traditional signal processing. This new paradigm allows controlling the synthesis process through learned high-level features or by conditioning a model on musically relevant information. In this paper, we apply a Generative Adversarial Network to the task of audio synthesis of drum sounds. By conditioning the model on perceptual features computed with a publicly available feature-extractor, intuitive control is gained over the generation process. The experiments are carried out on a large collection of kick, snare, and cymbal sounds. We show that, compared to a specific prior work based on a U-Net architecture, our approach considerably improves the quality of the generated drum samples, and that the conditional input indeed shapes the perceptual characteristics of the sounds. Also, we provide audio examples and release the code used in our experiments.