论文标题
视觉问题使用图像描述中的语义信息回答
Visual Question Answering Using Semantic Information from Image Descriptions
论文作者
论文摘要
在这项工作中,我们提出了一种深层神经体系结构,该架构使用了一种注意机制,该机制利用基于区域的图像特征,所提出的自然语言问题以及从图像区域提取的语义知识,以在视觉问题答案(VQA)任务中提出的问题产生开放式答案。这两个基于区域的特征和基于区域的文本信息的组合有关图像bolsters一个模型,以更准确地回答问题,并有可能在较少必需的培训数据中做到这一点。我们根据强大的基准评估了关于VQA任务的拟议体系结构,并表明我们的方法在此任务上取得了出色的成果。
In this work, we propose a deep neural architecture that uses an attention mechanism which utilizes region based image features, the natural language question asked, and semantic knowledge extracted from the regions of an image to produce open-ended answers for questions asked in a visual question answering (VQA) task. The combination of both region based features and region based textual information about the image bolsters a model to more accurately respond to questions and potentially do so with less required training data. We evaluate our proposed architecture on a VQA task against a strong baseline and show that our method achieves excellent results on this task.