双曲音频源分离

论文标题

双曲音频源分离

Hyperbolic Audio Source Separation

论文作者

Petermann, Darius, Wichern, Gordon, Subramanian, Aswin, Roux, Jonathan Le

论文摘要

我们在双曲线歧管上使用嵌入式介绍了一个音频源分离的框架，该嵌入式歧管紧凑地表示声源和时频功能之间的层次关系。受到最近成功建模文本中的层次关系和双曲线嵌入的图像的启发，我们的算法获得了混合信号的每个时间频箱的双曲线嵌入，并使用双曲线软效果层估算掩模。在包含多个人说话和乐器演奏的混合物的合成数据集中，我们的双曲模型以源与失真比的方式相当与欧几里得基线进行，并且在低嵌入尺寸下的性能更强。此外，我们发现包含多个重叠源的时频区域嵌入了双曲线空间的中心（即最不确定的区域），我们可以使用此确定性估算来有效地在分离单个声音时有效地折衷置换和干扰。

We introduce a framework for audio source separation using embeddings on a hyperbolic manifold that compactly represent the hierarchical relationship between sound sources and time-frequency features. Inspired by recent successes modeling hierarchical relationships in text and images with hyperbolic embeddings, our algorithm obtains a hyperbolic embedding for each time-frequency bin of a mixture signal and estimates masks using hyperbolic softmax layers. On a synthetic dataset containing mixtures of multiple people talking and musical instruments playing, our hyperbolic model performed comparably to a Euclidean baseline in terms of source to distortion ratio, with stronger performance at low embedding dimensions. Furthermore, we find that time-frequency regions containing multiple overlapping sources are embedded towards the center (i.e., the most uncertain region) of the hyperbolic space, and we can use this certainty estimate to efficiently trade-off between artifact introduction and interference reduction when isolating individual sounds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题