Cl-CrossVQA：跨域视觉问题的持续学习基准回答

论文标题

Cl-CrossVQA：跨域视觉问题的持续学习基准回答

CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering

论文作者

Zhang, Yao, Chen, Haokun, Frikha, Ahmed, Yang, Yezi, Krompass, Denis, Zhang, Gengyuan, Gu, Jindong, Tresp, Volker

论文摘要

视觉问题回答（VQA）是一项多学科研究任务。为了产生正确的答案，它需要了解图像的视觉内容，自然语言问题以及对图像和世界知识中包含的信息的常识推理。最近，由于其出色的性能，大规模的视觉和语言预训练模型（VLPM）已成为VQA任务的主流方法。标准实践是使用特定于域的VQA数据集对大规模训练的大规模VLPM进行了大规模VLPM。但是，实际上，应用程序域可以随着时间的流逝而变化，因此必须在不忘记先前获得的知识的情况下不断学习和适应新领域。大多数现有的持续学习（CL）研究集中在单峰任务上，而尚未研究更实用的应用程序场景，即跨域VQA的CL。在此激励的基础上，我们介绍了Cl-CrossVQA，这是一种严格的持续学习基准，用于跨域视觉问题回答，通过该基准，我们在4个VLPM，4个CL方法和5个来自不同域的VQA数据集上进行了广泛的实验。此外，通过探测中间层的遗忘现象，我们提供了有关模型架构如何影响CL性能的见解，CL方法为何可以帮助减轻VLPM的遗忘，以及如何设计适合在这个挑战性持续学习的持续学习环境中的CL方法。为了促进跨域VQA CL上的未来工作，我们将发布我们的数据集和代码。

Visual Question Answering (VQA) is a multi-discipline research task. To produce the right answer, it requires an understanding of the visual content of images, the natural language questions, as well as commonsense reasoning over the information contained in the image and world knowledge. Recently, large-scale Vision-and-Language Pre-trained Models (VLPMs) have been the mainstream approach to VQA tasks due to their superior performance. The standard practice is to fine-tune large-scale VLPMs pre-trained on huge general-domain datasets using the domain-specific VQA datasets. However, in reality, the application domain can change over time, necessitating VLPMs to continually learn and adapt to new domains without forgetting previously acquired knowledge. Most existing continual learning (CL) research concentrates on unimodal tasks, whereas a more practical application scenario, i.e, CL on cross-domain VQA, has not been studied. Motivated by this, we introduce CL-CrossVQA, a rigorous Continual Learning benchmark for Cross-domain Visual Question Answering, through which we conduct extensive experiments on 4 VLPMs, 4 CL approaches, and 5 VQA datasets from different domains. In addition, by probing the forgetting phenomenon of the intermediate layers, we provide insights into how model architecture affects CL performance, why CL approaches can help mitigate forgetting in VLPMs to some extent, and how to design CL approaches suitable for VLPMs in this challenging continual learning environment. To facilitate future work on CL for cross-domain VQA, we will release our datasets and code.

下载PDF全文

下载文献需遵守相关版权规定

论文标题