论文标题

零击音频分类,并带有线性和非线性声音 - 语义投影

Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections

论文作者

Xie, Huang, Räsänen, Okko, Virtanen, Tuomas

论文摘要

在本文中,我们通过音频实例和声音类之间的线性和非线性声音 - 语义投影研究音频分类中的零射击学习。音频分类中的零拍学习是指旨在识别没有可用培训数据的音频实例的分类问题,只有语义方面的信息。在本文中,我们通过采用了有组织的线性和非线性声音 - 语义预测来解决零射击学习。我们通过将等级分解应用于双线性模型,并使用非线性激活函数(例如Tanh)来建模声学嵌入和语义嵌入之间的非线性。与先前的双线性模型相比,实验结果表明,所提出的投影方法有效地改善了音频分类中零击学习的分类性能。

In this paper, we study zero-shot learning in audio classification through factored linear and nonlinear acoustic-semantic projections between audio instances and sound classes. Zero-shot learning in audio classification refers to classification problems that aim at recognizing audio instances of sound classes, which have no available training data but only semantic side information. In this paper, we address zero-shot learning by employing factored linear and nonlinear acoustic-semantic projections. We develop factored linear projections by applying rank decomposition to a bilinear model, and use nonlinear activation functions, such as tanh, to model the non-linearity between acoustic embeddings and semantic embeddings. Compared with the prior bilinear model, experimental results show that the proposed projection methods are effective for improving classification performance of zero-shot learning in audio classification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源