域概括具有宽松的实例频率归一化，以进行多个设备的声学场景分类

论文标题

域概括具有宽松的实例频率归一化，以进行多个设备的声学场景分类

Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene Classification

论文作者

Kim, Byeonggeun, Yang, Seunghan, Kim, Jangho, Park, Hyunsin, Lee, Juntae, Chang, Simyung

论文摘要

在图像处理中使用二维卷积神经网络（2D-CNN）时，可以使用通道统计数据来操纵域信息，实例归一化是获得域不变特征的一种有希望的方法。但是，与图像处理不同，我们分析了音频功能中与域相关的信息在频率统计中占主导地位，而不是通道统计。通过我们的分析，我们引入了宽松的实例频率归一化（RFN）：沿频率轴的插件，显式归一化模块，可以消除音频功能中实例特定的域差异，同时放松不良的有用歧视性信息。从经验上讲，与先前的声学场景分类中的域泛化方法相比，仅将RFN添加到网络中显示出明显的边缘，并且可以提高多个音频设备的鲁棒性。尤其是，拟议的RFN赢得了DCASE2021挑战任务1A，具有多个设备的低复杂声音场景分类，并具有明显的利润，RFN是我们技术报告的扩展工作。

While using two-dimensional convolutional neural networks (2D-CNNs) in image processing, it is possible to manipulate domain information using channel statistics, and instance normalization has been a promising way to get domain-invariant features. However, unlike image processing, we analyze that domain-relevant information in an audio feature is dominant in frequency statistics rather than channel statistics. Motivated by our analysis, we introduce Relaxed Instance Frequency-wise Normalization (RFN): a plug-and-play, explicit normalization module along the frequency axis which can eliminate instance-specific domain discrepancy in an audio feature while relaxing undesirable loss of useful discriminative information. Empirically, simply adding RFN to networks shows clear margins compared to previous domain generalization approaches on acoustic scene classification and yields improved robustness for multiple audio devices. Especially, the proposed RFN won the DCASE2021 challenge TASK1A, low-complexity acoustic scene classification with multiple devices, with a clear margin, and RFN is an extended work of our technical report.

下载PDF全文

下载文献需遵守相关版权规定

论文标题