论文标题
探索用于音频学习的火车和测试时间增加
Exploring Train and Test-Time Augmentations for Audio-Language Learning
论文作者
论文摘要
在本文中,我们旨在揭示数据增强在音频多模式学习中的影响,尽管它很重要,但尚未探索。我们不仅在火车时间,还可以测试时间探索各种增强方法,并发现适当的数据增加可以导致实质性改进。具体而言,应用我们提出的音频配对增强配对,这是第一个多模式音频语言增强方法,优于自动音频字幕和音频text检索任务的基准。为了充分利用数据增强,我们还为测试时间提供了多级测试时间增强(Multi-TTA)。我们成功地纳入了两种建议的方法和单模式的增强,并在音频字幕上实现了47.5蜘蛛,这比基线相对增长了18.2%。在音频文本检索中,提出的方法也显示出性能的改善。
In this paper, we aim to unveil the impact of data augmentation in audio-language multi-modal learning, which has not been explored despite its importance. We explore various augmentation methods at not only train-time but also test-time and find out that proper data augmentation can lead to substantial improvements. Specifically, applying our proposed audio-language paired augmentation PairMix, which is the first multi-modal audio-language augmentation method, outperforms the baselines for both automated audio captioning and audio-text retrieval tasks. To fully take advantage of data augmentation, we also present multi-level test-time augmentation (Multi-TTA) for the test-time. We successfully incorporate the two proposed methods and uni-modal augmentations and achieve 47.5 SPIDEr on audio captioning, which is an 18.2% relative increase over the baseline. In audio-text retrieval, the proposed methods also show an improvement in performance as well.