基于语言的音频检索任务2022挑战

论文标题

基于语言的音频检索任务2022挑战

Language-based Audio Retrieval Task in DCASE 2022 Challenge

论文作者

Xie, Huang, Lipping, Samuel, Virtanen, Tuomas

论文摘要

基于语言的音频检索是一项任务，其中自然语言文本字幕用作查询从数据集检索音频信号的查询。首先将其作为任务6的子任务6b引入Dcase 2022挑战，该挑战旨在开发计算系统来建模音频信号和自由形式的文本描述之间的关系。与音频字幕（子任务6A）相比，这是关于为音频信号生成音频字幕，基于语言的音频检索（子任务6B）的重点是根据自然语言文本字幕对音频信号进行排名。在DCASE 2022挑战中，子任务6B提供的基线系统的表现明显胜过，在MAP@10中，最高性能为0.276。本文根据提交的系统的性能和分析，介绍了子任务6B的结果。

Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset. It has been first introduced into DCASE 2022 Challenge as Subtask 6B of task 6, which aims at developing computational systems to model relationships between audio signals and free-form textual descriptions. Compared with audio captioning (Subtask 6A), which is about generating audio captions for audio signals, language-based audio retrieval (Subtask 6B) focuses on ranking audio signals according to their relevance to natural language textual captions. In DCASE 2022 Challenge, the provided baseline system for Subtask 6B was significantly outperformed, with top performance being 0.276 in mAP@10. This paper presents the outcome of Subtask 6B in terms of submitted systems' performance and analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题