对几个音频分类的研究

论文标题

对几个音频分类的研究

A Study of Few-Shot Audio Classification

论文作者

Wolters, Piper, Careaga, Chris, Hutchinson, Brian, Phillips, Lauren

论文摘要

深度学习的进步导致了许多音频分类任务的最新性能，但是与人类不同，这些系统在传统上需要大量数据来进行准确的预测。并非每个人或组织都可以访问这些资源，而这样做的组织，例如我们的整个领域，并不能反映我们国家的人口统计信息。使人们能够在没有重大资源障碍的情况下使用机器学习很重要，因为机器学习是解决问题的越来越有用的工具，并且在将更广泛的人掌握在更广泛的手中时，可以解决更广泛的问题。很少有学习的一种机器学习旨在使模型能够以很少的示例推广到新课程。在这项研究中，我们通过原型网络少数网络学习算法来解决两个音频分类任务（说话者识别和活动分类），并评估各种编码器体系结构的性能。我们的编码器包括复发性神经网络以及一维卷积神经网络。我们评估了我们在Voxceleb数据集和ICSI会议语料库上的说话者识别的模型，分别获得了5-shot 5-way精度为93.5％和54.0％。我们还使用动力学的少量子集评估了来自音频的活动分类，〜600数据集和音频集，分别来自YouTube视频，分别获得了51.5％和35.2％的精度。

Advances in deep learning have resulted in state-of-the-art performance for many audio classification tasks but, unlike humans, these systems traditionally require large amounts of data to make accurate predictions. Not every person or organization has access to those resources, and the organizations that do, like our field at large, do not reflect the demographics of our country. Enabling people to use machine learning without significant resource hurdles is important, because machine learning is an increasingly useful tool for solving problems, and can solve a broader set of problems when put in the hands of a broader set of people. Few-shot learning is a type of machine learning designed to enable the model to generalize to new classes with very few examples. In this research, we address two audio classification tasks (speaker identification and activity classification) with the Prototypical Network few-shot learning algorithm, and assess performance of various encoder architectures. Our encoders include recurrent neural networks, as well as one- and two-dimensional convolutional neural networks. We evaluate our model for speaker identification on the VoxCeleb dataset and ICSI Meeting Corpus, obtaining 5-shot 5-way accuracies of 93.5% and 54.0%, respectively. We also evaluate for activity classification from audio using few-shot subsets of the Kinetics~600 dataset and AudioSet, both drawn from Youtube videos, obtaining 51.5% and 35.2% accuracy, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题