打开域问答案系统可以回答视觉知识问题吗？

论文标题

打开域问答案系统可以回答视觉知识问题吗？

Can Open Domain Question Answering Systems Answer Visual Knowledge Questions?

论文作者

Zhang, Jiawen, Mishra, Abhijit, S, Avinesh P. V., Patwardhan, Siddharth, Agarwal, Sachin

论文摘要

外部知识的任务视觉问题回答（OKVQA）需要一个自动系统来回答有关图片和图像的自然语言问题，并使用外部知识回答有关图片和图像的问题。我们观察到，许多视觉问题包含图像中指代实体的Deictic参考短语，都可以重写为“无基础”的问题，并且可以通过现有的基于文本的问题答案系统来回答。这允许重新使用现有的基于文本的开放域问答（QA）系统以进行视觉问答。在这项工作中，我们提出了一种潜在的数据效率方法，该方法将重用现有系统进行（a）图像分析，（b）问题重写以及（c）基于文本的问题回答以回答此类视觉问题。给定图像和与该图像有关的问题（一个视觉问题），我们首先使用预训练的对象和场景分类器提取图像中存在的实体。使用这些检测到的实体，可以对视觉问题进行重写，以便通过开放域QA系统可以回答。我们探讨了两种重写策略：（1）使用BERT进行掩盖和重写的无监督方法，以及（2）一种弱监督的方法，结合了自适应重写和加强学习技巧，利用质量检查系统中隐含的反馈。我们在公开可用的OKVQA数据集上测试了我们的策略，并仅使用10％的培训数据，以最先进的模型获得竞争性能。

The task of Outside Knowledge Visual Question Answering (OKVQA) requires an automatic system to answer natural language questions about pictures and images using external knowledge. We observe that many visual questions, which contain deictic referential phrases referring to entities in the image, can be rewritten as "non-grounded" questions and can be answered by existing text-based question answering systems. This allows for the reuse of existing text-based Open Domain Question Answering (QA) Systems for visual question answering. In this work, we propose a potentially data-efficient approach that reuses existing systems for (a) image analysis, (b) question rewriting, and (c) text-based question answering to answer such visual questions. Given an image and a question pertaining to that image (a visual question), we first extract the entities present in the image using pre-trained object and scene classifiers. Using these detected entities, the visual questions can be rewritten so as to be answerable by open domain QA systems. We explore two rewriting strategies: (1) an unsupervised method using BERT for masking and rewriting, and (2) a weakly supervised approach that combines adaptive rewriting and reinforcement learning techniques to use the implicit feedback from the QA system. We test our strategies on the publicly available OKVQA dataset and obtain a competitive performance with state-of-the-art models while using only 10% of the training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题