家庭活动从音频通过深度可分离卷积自动编码器网络聚类

论文标题

家庭活动从音频通过深度可分离卷积自动编码器网络聚类

Domestic Activity Clustering from Audio via Depthwise Separable Convolutional Autoencoder Network

论文作者

Li, Yanxiong, Cao, Wenchang, Drossos, Konstantinos, Virtanen, Tuomas

论文摘要

自动估计音频中的家庭活动可用于解决许多问题，例如降低护理老年人的劳动成本。这项研究重点是解决音频中家庭活动聚类的问题。家庭活动聚类的目标是将属于同一类别的家庭活动的音频剪辑以无监督的方式属于一个集群。在本文中，我们提出了一种使用可分开的卷积自动编码器网络的家庭活动聚类的方法。在提出的方法中，通过深度可分离的卷积自动编码器学习初始嵌入，并且以聚集为导向的损失旨在共同优化嵌入细化和群集分配。在2018年对检测和分类的挑战中使用的公共数据集（罪恶数据集的衍生物）评估了不同的方法。我们的方法获得了54.46％的归一化信息（NMI）分数（NMI），并获得了63.64％和OVENMS的准确性（CA）的准确性（CA）。此外，我们方法的计算复杂性和记忆要求都低于先前基于模型的方法的计算复杂性和记忆要求。代码：https：//github.com/vinceasvp/domestic-activity-clustering-from-audio

Automatic estimation of domestic activities from audio can be used to solve many problems, such as reducing the labor cost for nursing the elderly people. This study focuses on solving the problem of domestic activity clustering from audio. The target of domestic activity clustering is to cluster audio clips which belong to the same category of domestic activity into one cluster in an unsupervised way. In this paper, we propose a method of domestic activity clustering using a depthwise separable convolutional autoencoder network. In the proposed method, initial embeddings are learned by the depthwise separable convolutional autoencoder, and a clustering-oriented loss is designed to jointly optimize embedding refinement and cluster assignment. Different methods are evaluated on a public dataset (a derivative of the SINS dataset) used in the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) in 2018. Our method obtains the normalized mutual information (NMI) score of 54.46%, and the clustering accuracy (CA) score of 63.64%, and outperforms state-of-the-art methods in terms of NMI and CA. In addition, both computational complexity and memory requirement of our method is lower than that of previous deep-model-based methods. Codes: https://github.com/vinceasvp/domestic-activity-clustering-from-audio

下载PDF全文

下载文献需遵守相关版权规定

论文标题