C3VQG：类别一致的循环视觉问题生成

论文标题

C3VQG：类别一致的循环视觉问题生成

C3VQG: Category Consistent Cyclic Visual Question Generation

论文作者

Uppal, Shagun, Madan, Anish, Bhagat, Sarthak, Yu, Yi, Shah, Rajiv Ratn

论文摘要

视觉问题生成（VQG）是基于图像生成自然问题的任务。过去，流行的方法探索了经过最大可能性训练的图像到序列体系结构，这些架构表现出了有意义的产生的问题及其相关的地面真实答案。如果图像包含描述其不同语义类别的丰富上下文信息，则VQG变得更具挑战性。在本文中，我们试图在图像中利用不同的视觉提示和概念，以使用变异自动编码器（VAE）产生问题，而无需地面真实答案。我们的方法解决了现有VQG系统的两个主要缺点：（i）最大程度地减少监督水平，（ii）用相关的类别替换通用问题。最重要的是，通过消除昂贵的答案注释，所需的监督被削弱了。使用不同的类别使我们能够利用不同的概念，因为推理只需要图像和类别。在我们VAE的潜在空间中的图像，问题和答案类别之间，相互信息是最大化的。提出了一种新型类别一致的循环损失，以使模型能够就答案类别产生一致的预测，从而减少冗余和不规则性。此外，我们还对生成模型的潜在空间强加了补充约束，以基于类别提供结构，并通过封装每个维度内的非相关特征来增强概括。通过广泛的实验，提出的模型C3VQG优于弱监督的最先进的VQG方法。

Visual Question Generation (VQG) is the task of generating natural questions based on an image. Popular methods in the past have explored image-to-sequence architectures trained with maximum likelihood which have demonstrated meaningful generated questions given an image and its associated ground-truth answer. VQG becomes more challenging if the image contains rich contextual information describing its different semantic categories. In this paper, we try to exploit the different visual cues and concepts in an image to generate questions using a variational autoencoder (VAE) without ground-truth answers. Our approach solves two major shortcomings of existing VQG systems: (i) minimize the level of supervision and (ii) replace generic questions with category relevant generations. Most importantly, by eliminating expensive answer annotations, the required supervision is weakened. Using different categories enables us to exploit different concepts as the inference requires only the image and the category. Mutual information is maximized between the image, question, and answer category in the latent space of our VAE. A novel category consistent cyclic loss is proposed to enable the model to generate consistent predictions with respect to the answer category, reducing redundancies and irregularities. Additionally, we also impose supplementary constraints on the latent space of our generative model to provide structure based on categories and enhance generalization by encapsulating decorrelated features within each dimension. Through extensive experiments, the proposed model, C3VQG outperforms state-of-the-art VQG methods with weak supervision.

下载PDF全文

下载文献需遵守相关版权规定

论文标题