论文标题

关于大规模视觉学习的对抗性鲁棒性

On Adversarial Robustness of Large-scale Audio Visual Learning

论文作者

Li, Juncheng B, Qu, Shuhui, Li, Xinjian, Huang, Po-Yao, Metze, Florian

论文摘要

由于视听系统正在部署,以进行关键安全任务,例如监视和恶意内容过滤,因此它们的稳健性仍然是一个不足的领域。现有关于鲁棒性的已发表的工作要么不扩展到大规模数据集,要么不涉及多种模式。这项工作旨在通过鲁棒性来研究与多模式学习有关的几个关键问题:1)多模式模型一定比单模型模型更健壮? 2)如何有效地衡量多模式学习的鲁棒性? 3)如何融合不同的方式以实现更强大的多模式模型?为了了解在大规模环境中多模式模型的鲁棒性,我们提出了一个基于密度的度量,并有一个凸度度量,以有效地测量高维潜在空间中每种模态的分布。我们的工作提供了理论直觉以及经验证据,表明多模式融合如何通过这些指标影响对抗性鲁棒性。我们进一步设计了一种基于指标的混合策略,以提高受过训练的模型的鲁棒性。我们对音频集和动力学的实验验证了我们的假设,即面对对抗性示例,多模式模型不一定比其单模式对应物更健壮。我们还观察到我们的混合训练方法可以实现与传统对抗训练一样多的保护,并提供计算廉价的替代方案。实施:https://github.com/lijuncheng16/audiosetdoneright

As audio-visual systems are being deployed for safety-critical tasks such as surveillance and malicious content filtering, their robustness remains an under-studied area. Existing published work on robustness either does not scale to large-scale dataset, or does not deal with multiple modalities. This work aims to study several key questions related to multi-modal learning through the lens of robustness: 1) Are multi-modal models necessarily more robust than uni-modal models? 2) How to efficiently measure the robustness of multi-modal learning? 3) How to fuse different modalities to achieve a more robust multi-modal model? To understand the robustness of the multi-modal model in a large-scale setting, we propose a density-based metric, and a convexity metric to efficiently measure the distribution of each modality in high-dimensional latent space. Our work provides a theoretical intuition together with empirical evidence showing how multi-modal fusion affects adversarial robustness through these metrics. We further devise a mix-up strategy based on our metrics to improve the robustness of the trained model. Our experiments on AudioSet and Kinetics-Sounds verify our hypothesis that multi-modal models are not necessarily more robust than their uni-modal counterparts in the face of adversarial examples. We also observe our mix-up trained method could achieve as much protection as traditional adversarial training, offering a computationally cheap alternative. Implementation: https://github.com/lijuncheng16/AudioSetDoneRight

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源