用于声学场景分类的低复杂性CNN

论文标题

用于声学场景分类的低复杂性CNN

Low-complexity CNNs for Acoustic Scene Classification

论文作者

Singh, Arshdeep, Plumbley, Mark D.

论文摘要

本文提出了用于声学场景分类（ASC）的低复杂框架。与手工设计的功能相比，设计用于ASC的大多数框架使用卷积神经网络（CNN），其性能提高。但是，由于CNN的尺寸较大，计算复杂性很高，因此CNN是饥饿的资源。因此，CNN难以在资源约束设备上部署。本文解决了减少CNN中计算复杂性和内存需求的问题。我们提出了一个低复杂性CNN体系结构，并应用修剪和量化以进一步减少参数和内存。然后，我们提出了一个合奏框架，该框架结合了各种低复杂性CNN，以提高整体性能。对所提出的框架进行了实验评估，该框架对关注ASC的公开DCASE 2022任务1进行。所提出的整体框架的参数约为60k，需要19m的多重功能操作，并且与DCASE 2022 Task 1基线网络相比，性能提高了约2-4个百分点。

This paper presents a low-complexity framework for acoustic scene classification (ASC). Most of the frameworks designed for ASC use convolutional neural networks (CNNs) due to their learning ability and improved performance compared to hand-engineered features. However, CNNs are resource hungry due to their large size and high computational complexity. Therefore, CNNs are difficult to deploy on resource constrained devices. This paper addresses the problem of reducing the computational complexity and memory requirement in CNNs. We propose a low-complexity CNN architecture, and apply pruning and quantization to further reduce the parameters and memory. We then propose an ensemble framework that combines various low-complexity CNNs to improve the overall performance. An experimental evaluation of the proposed framework is performed on the publicly available DCASE 2022 Task 1 that focuses on ASC. The proposed ensemble framework has approximately 60K parameters, requires 19M multiply-accumulate operations and improves the performance by approximately 2-4 percentage points compared to the DCASE 2022 Task 1 baseline network.

下载PDF全文

下载文献需遵守相关版权规定

论文标题