论文标题
硬币:VQA解释的反事实图像生成
COIN: Counterfactual Image Generation for VQA Interpretation
论文作者
论文摘要
由于自然语言处理和基于计算机视觉的模型的显着进步,视觉问题答案(VQA)系统变得越来越聪明和先进。但是,在处理相对复杂的问题时,它们仍然容易出错。因此,重要的是要在采用其结果之前了解VQA模型的行为。在本文中,我们通过生成反事实图像为VQA模型介绍了一种可解释性方法。具体而言,生成的图像应该对原始图像具有最小的可能更改,并引导VQA模型给出不同的答案。此外,我们的方法确保生成的图像是现实的。由于无法使用定量指标来评估模型的可解释性,因此我们进行了一项用户研究以评估方法的不同方面。除了解释单个图像上VQA模型的结果外,获得的结果和讨论还提供了VQA模型的行为的广泛解释。
Due to the significant advancement of Natural Language Processing and Computer Vision-based models, Visual Question Answering (VQA) systems are becoming more intelligent and advanced. However, they are still error-prone when dealing with relatively complex questions. Therefore, it is important to understand the behaviour of the VQA models before adopting their results. In this paper, we introduce an interpretability approach for VQA models by generating counterfactual images. Specifically, the generated image is supposed to have the minimal possible change to the original image and leads the VQA model to give a different answer. In addition, our approach ensures that the generated image is realistic. Since quantitative metrics cannot be employed to evaluate the interpretability of the model, we carried out a user study to assess different aspects of our approach. In addition to interpreting the result of VQA models on single images, the obtained results and the discussion provides an extensive explanation of VQA models' behaviour.