通过样式转移进行混杂：用于解开空间图像和音乐内容的变异自动编码器

论文标题

通过样式转移进行混杂：用于解开空间图像和音乐内容的变异自动编码器

Upmixing via style transfer: a variational autoencoder for disentangling spatial images and musical content

论文作者

Yang, Haici, Wager, Sanna, Russell, Spencer, Luo, Mike, Kim, Minje, Kim, Wontak

论文摘要

在音乐的立体声通道到杂音中，音乐中的混杂问题之一是，一项主要任务是在多渠道渲染结果中设置仪器源的方向性。在本文中，我们提出了一种修改的变分自动编码器模型，该模型学习了一个潜在空间来描述多通道音乐中的空间图像。我们试图解开空间图像和音乐内容，因此学习的潜在变量是音乐不变的。在测试时，我们使用潜在变量来控制源的平移。我们提出了两个上混合用例：将空间图像从一首歌转移到另一首歌曲，并基于生成模型将空间播放。我们报告客观和主观评估结果，以表明我们的模型可与音乐内容分开捕获空间图像，并实现基于转移的交互式平移。

In the stereo-to-multichannel upmixing problem for music, one of the main tasks is to set the directionality of the instrument sources in the multichannel rendering results. In this paper, we propose a modified variational autoencoder model that learns a latent space to describe the spatial images in multichannel music. We seek to disentangle the spatial images and music content, so the learned latent variables are invariant to the music. At test time, we use the latent variables to control the panning of sources. We propose two upmixing use cases: transferring the spatial images from one song to another and blind panning based on the generative model. We report objective and subjective evaluation results to empirically show that our model captures spatial images separately from music content and achieves transfer-based interactive panning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题