论文标题
语言先验不是唯一的快捷方式:VQA中的快捷方式学习的基准
Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA
论文作者
论文摘要
视觉问题回答(VQA)模型容易学习数据集偏差而不是预期解决方案形成的快捷解决方案。为了评估VQA模型超出快捷方式学习的推理能力,VQA-CP V2数据集在给定问题类型的情况下引入了培训和测试集之间的分布变化。这样,该模型不能使用训练集快捷方式(从问题类型到答案)在测试集上表现良好。但是,VQA-CP V2仅考虑一种类型的快捷方式,因此仍然无法保证该模型依赖于预期的解决方案,而不是特定于此快捷方式的解决方案。为了克服这一限制,我们提出了一个新的数据集,该数据集通过在多个OOD测试集中构造不同的分布变化来考虑不同类型的快捷方式。此外,我们克服了使用VQA-CP V2的三个令人不安的实践,例如,使用OOD测试集选择模型,并进一步标准化OOD评估程序。我们的基准为VQA中的快捷方式学习提供了更严格,更全面的测试床。我们基于最新方法,发现专门为特定快捷方式设计的方法无法同时推广到我们的不同OOD测试集。我们还系统地研究了不同的快捷方式,并提供了一些有价值的发现,这可能会促进VQA中快捷方式学习的探索。
Visual Question Answering (VQA) models are prone to learn the shortcut solution formed by dataset biases rather than the intended solution. To evaluate the VQA models' reasoning ability beyond shortcut learning, the VQA-CP v2 dataset introduces a distribution shift between the training and test set given a question type. In this way, the model cannot use the training set shortcut (from question type to answer) to perform well on the test set. However, VQA-CP v2 only considers one type of shortcut and thus still cannot guarantee that the model relies on the intended solution rather than a solution specific to this shortcut. To overcome this limitation, we propose a new dataset that considers varying types of shortcuts by constructing different distribution shifts in multiple OOD test sets. In addition, we overcome the three troubling practices in the use of VQA-CP v2, e.g., selecting models using OOD test sets, and further standardize OOD evaluation procedure. Our benchmark provides a more rigorous and comprehensive testbed for shortcut learning in VQA. We benchmark recent methods and find that methods specifically designed for particular shortcuts fail to simultaneously generalize to our varying OOD test sets. We also systematically study the varying shortcuts and provide several valuable findings, which may promote the exploration of shortcut learning in VQA.