从轻松到硬：学习语言指导的课程，以回答遥感数据的视觉问题

论文标题

从轻松到硬：学习语言指导的课程，以回答遥感数据的视觉问题

From Easy to Hard: Learning Language-guided Curriculum for Visual Question Answering on Remote Sensing Data

论文作者

Yuan, Zhenghang, Mou, Lichao, Wang, Qi, Zhu, Xiao Xiang

论文摘要

遥感场景的视觉问题回答（VQA）在智能人工交互系统中具有很大的潜力。尽管计算机视觉中的VQA已广泛研究，但用于遥感数据的VQA（RSVQA）仍处于起步阶段。 RSVQA任务需要特别考虑两个特征。 1）RSVQA数据集中没有对象注释，这使得模型难以利用信息性的区域表示； 2）在RSVQA任务中，每个图像都有明显不同的难度级别的问题。直接以随机顺序训练模型可能会使模型混淆并限制性能。为了解决这两个问题，在本文中，提出了一种多层视觉特征学习方法，以共同提取语言指导的整体和区域图像特征。此外，开发了基于自定进度的课程学习（SPCL）模型，以易于使用的方式训练带有样品的网络。更具体地说，在这项工作中探索了一种具有软加权策略的语言引导的SPCL方法。提出的模型在三个公共数据集上进行了评估，并且广泛的实验结果表明，提出的RSVQA框架可以实现有希望的性能。

Visual question answering (VQA) for remote sensing scene has great potential in intelligent human-computer interaction system. Although VQA in computer vision has been widely researched, VQA for remote sensing data (RSVQA) is still in its infancy. There are two characteristics that need to be specially considered for the RSVQA task. 1) No object annotations are available in RSVQA datasets, which makes it difficult for models to exploit informative region representation; 2) There are questions with clearly different difficulty levels for each image in the RSVQA task. Directly training a model with questions in a random order may confuse the model and limit the performance. To address these two problems, in this paper, a multi-level visual feature learning method is proposed to jointly extract language-guided holistic and regional image features. Besides, a self-paced curriculum learning (SPCL)-based VQA model is developed to train networks with samples in an easy-to-hard way. To be more specific, a language-guided SPCL method with a soft weighting strategy is explored in this work. The proposed model is evaluated on three public datasets, and extensive experimental results show that the proposed RSVQA framework can achieve promising performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题