通过随时间变化的功能调制对黑盒音频效应进行建模

论文标题

通过随时间变化的功能调制对黑盒音频效应进行建模

Modelling black-box audio effects with time-varying feature modulation

论文作者

Comunità, Marco, Steinmetz, Christian J., Phan, Huy, Reiss, Joshua D.

论文摘要

但是，对音频效应的黑盒建模的深度学习方法已显示出希望，但是，大多数现有作品都集中在相对较短的时间尺度上（例如吉他放大器和失真）上的非线性效应。虽然理论上可以扩展复发和卷积体系结构以在更长的时间尺度上捕获行为，但我们表明，在建模音频效应（例如fuzz和动态范围压缩）时，简单地缩放现有体系结构的宽度，深度或扩张因子并不会导致令人满意的性能。为了解决这个问题，我们建议将随时间变化的线性调制集成到现有的时间卷积骨架中，这种方法可以使中间激活的可学习适应。我们证明，我们的方法更准确地捕获了时间和频域指标的一系列模糊和压缩机实现的远程依赖性。我们提供合理的示例，源代码和预处理的模型，以促进可重复性。

Deep learning approaches for black-box modelling of audio effects have shown promise, however, the majority of existing work focuses on nonlinear effects with behaviour on relatively short time-scales, such as guitar amplifiers and distortion. While recurrent and convolutional architectures can theoretically be extended to capture behaviour at longer time scales, we show that simply scaling the width, depth, or dilation factor of existing architectures does not result in satisfactory performance when modelling audio effects such as fuzz and dynamic range compression. To address this, we propose the integration of time-varying feature-wise linear modulation into existing temporal convolutional backbones, an approach that enables learnable adaptation of the intermediate activations. We demonstrate that our approach more accurately captures long-range dependencies for a range of fuzz and compressor implementations across both time and frequency domain metrics. We provide sound examples, source code, and pretrained models to faciliate reproducibility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题