论文标题

NAS-VAD:神经架构搜索语音活动检测

NAS-VAD: Neural Architecture Search for Voice Activity Detection

论文作者

Rho, Daniel, Park, Jinhyeok, Ko, Jong Hwan

论文摘要

已经提出了各种基于神经网络的方法,以进行更健壮和准确的语音活动检测(VAD)。这种神经体系结构的手动设计是一个容易出错且耗时的过程,它促使神经体系结构搜索(NAS)的开发自动设计和优化网络体系结构。尽管NAS已成功地用于提高各种任务的性能,但尚未在VAD域中利用它。在本文中,我们介绍了使用NAS方法进行VAD任务的第一项工作。为了有效地搜索VAD任务的架构,我们提出了一个修改后的宏观结构和一个新的搜索空间,其操作范围更广泛,包括注意操作。结果表明,在各种噪声添加和真实的录制数据集中,建议的NAS框架找到的网络结构优于先前手动设计的最新VAD模型。我们还表明,在特定数据集上搜索的体系结构在看不见的音频数据集上提高了概括性能。我们的代码和型号可在https://github.com/daniel03c1/nas_vad上找到。

Various neural network-based approaches have been proposed for more robust and accurate voice activity detection (VAD). Manual design of such neural architectures is an error-prone and time-consuming process, which prompted the development of neural architecture search (NAS) that automatically design and optimize network architectures. While NAS has been successfully applied to improve performance in a variety of tasks, it has not yet been exploited in the VAD domain. In this paper, we present the first work that utilizes NAS approaches on the VAD task. To effectively search architectures for the VAD task, we propose a modified macro structure and a new search space with a much broader range of operations that includes attention operations. The results show that the network structures found by the propose NAS framework outperform previous manually designed state-of-the-art VAD models in various noise-added and real-world-recorded datasets. We also show that the architectures searched on a particular dataset achieve improved generalization performance on unseen audio datasets. Our code and models are available at https://github.com/daniel03c1/NAS_VAD.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源