基于配对的逆金字塔结构和密集的MLP块的有效音频分类网络

论文标题

基于配对的逆金字塔结构和密集的MLP块的有效音频分类网络

Effective Audio Classification Network Based on Paired Inverse Pyramid Structure and Dense MLP Block

论文作者

Chen, Yunhao, Zhu, Yunjie, Yan, Zihui, Huang, Yifan, Ren, Zhen, Shen, Jianlu, Chen, Lifang

论文摘要

最近，基于卷积神经网络（CNN）和自我发挥机制的大规模体系结构已成为音频分类所必需的。尽管这些技术是最先进的，但这些作品的有效性只能通过巨大的计算成本和参数，大量的数据增强，大量数据集的转移以及其他一些技巧来保证。通过利用音频的轻巧性质，我们提出了一个有效的网络结构，称为配对的逆金字塔结构（PIP）和一个称为配对的逆金字塔结构MLP网络（PIPMN）的网络。 PIPMN达到urbansound8k数据集上的环境声音分类（ESC）精度的96％，而GTAZN数据集中的93.2 \％的音乐类型分类（MGC）仅为100万个参数。在没有数据增强或模型传输的情况下，实现了两个结果。公共代码可在以下网址找到：https：//github.com/jnaic/pipmn

Recently, massive architectures based on Convolutional Neural Network (CNN) and self-attention mechanisms have become necessary for audio classification. While these techniques are state-of-the-art, these works' effectiveness can only be guaranteed with huge computational costs and parameters, large amounts of data augmentation, transfer from large datasets and some other tricks. By utilizing the lightweight nature of audio, we propose an efficient network structure called Paired Inverse Pyramid Structure (PIP) and a network called Paired Inverse Pyramid Structure MLP Network (PIPMN). The PIPMN reaches 96\% of Environmental Sound Classification (ESC) accuracy on the UrbanSound8K dataset and 93.2\% of Music Genre Classification (MGC) on the GTAZN dataset, with only 1 million parameters. Both of the results are achieved without data augmentation or model transfer. Public code is available at: https://github.com/JNAIC/PIPMN

下载PDF全文

下载文献需遵守相关版权规定

论文标题