真实空间声音场景的声音事件本地化和检测：独立于事件的网络和数据增强链

论文标题

真实空间声音场景的声音事件本地化和检测：独立于事件的网络和数据增强链

Sound Event Localization and Detection for Real Spatial Sound Scenes: Event-Independent Network and Data Augmentation Chains

论文作者

Hu, Jinbo, Cao, Yin, Wu, Ming, Kong, Qiuqiang, Yang, Feiran, Plumbley, Mark D., Yang, Jun

论文摘要

声音事件的定位和检测（SELD）是声音事件检测和到达方向估计的联合任务。在DCASE 2022任务3中，数据类型从计算生成的空间记录转换为实现场景的记录。我们提交给Dcase 2022任务3的系统基于我们以前提出的与事件无关的网络V2（EINV2），采用新颖的数据增强方法。我们的方法采用EINV2采用轨道输出格式，置换不变的训练和软参数共享策略来检测同一类但不同位置的不同声音事件。构象异构体的结构用于扩展EINV2学习本地和全局特征。一种数据增强方法包含几个数据增强链，该链由几个不同数据增强操作的随机组合组成，用于概括该模型。为了减轻开发数据集中缺乏现实的记录以及声音事件的存在是不平衡的，我们利用FSD50K，Audioset和Tau空间室脉冲响应数据库（TAU-SRIR DB）来生成模拟数据集进行培训。我们详细介绍了Sony-Tau逼真的空间音景（Starss22）的验证集的结果。实验结果表明，在不同类别之间概括到不同环境和不平衡性能的能力是两个主要挑战。我们在DCASE 2022挑战的任务3中评估了我们提出的方法，并在团队排名中获得第二名。源代码已发布。

Sound event localization and detection (SELD) is a joint task of sound event detection and direction-of-arrival estimation. In DCASE 2022 Task 3, types of data transform from computationally generated spatial recordings to recordings of real-sound scenes. Our system submitted to the DCASE 2022 Task 3 is based on our previous proposed Event-Independent Network V2 (EINV2) with a novel data augmentation method. Our method employs EINV2 with a track-wise output format, permutation-invariant training, and a soft parameter-sharing strategy, to detect different sound events of the same class but in different locations. The Conformer structure is used for extending EINV2 to learn local and global features. A data augmentation method, which contains several data augmentation chains composed of stochastic combinations of several different data augmentation operations, is utilized to generalize the model. To mitigate the lack of real-scene recordings in the development dataset and the presence of sound events being unbalanced, we exploit FSD50K, AudioSet, and TAU Spatial Room Impulse Response Database (TAU-SRIR DB) to generate simulated datasets for training. We present results on the validation set of Sony-TAu Realistic Spatial Soundscapes 2022 (STARSS22) in detail. Experimental results indicate that the ability to generalize to different environments and unbalanced performance among different classes are two main challenges. We evaluate our proposed method in Task 3 of the DCASE 2022 challenge and obtain the second rank in the teams ranking. Source code is released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题