主动视觉信息收集用于视觉导航

论文标题

主动视觉信息收集用于视觉导航

Active Visual Information Gathering for Vision-Language Navigation

论文作者

Wang, Hanqing, Wang, Wenguan, Shu, Tianmin, Liang, Wei, Shen, Jianbing

论文摘要

视觉语言导航（VLN）是要求代理在照片真实环境中执行导航说明的任务。 VLN的主要挑战之一是如何通过减轻模棱两可的指示和对环境观察不足引起的不确定性来进行强大的导航。经过当前方法训练的特工通常会遭受这种困扰，因此将努力避免在每个步骤中避免随机和效率低下的行动。相比之下，当人类面临如此挑战时，他们仍然可以通过积极探索周围环境来收集更多信息，从而做出更自信的导航决策，从而保持强大的导航。这项工作从人类导航行为中汲取灵感，并赋予代理商具有积极的信息收集能力，以制定更智能的视觉导航政策。为了实现这一目标，我们提出了一个学习探索政策的端到端框架，以决定i）i）何时何地探索，ii）在探索过程中值得聚集的信息以及iii）如何在探索后调整导航决策。实验结果表明，训练中出现了有希望的勘探策略，从而显着提高了导航性能。在R2R挑战排行榜上，我们的经纪人获得了有希望的结果，所有三个VLN设置，即单次运行，预探测和光束搜索。

Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments. One of the key challenges in VLN is how to conduct a robust navigation by mitigating the uncertainty caused by ambiguous instructions and insufficient observation of the environment. Agents trained by current approaches typically suffer from this and would consequently struggle to avoid random and inefficient actions at every step. In contrast, when humans face such a challenge, they can still maintain robust navigation by actively exploring the surroundings to gather more information and thus make more confident navigation decisions. This work draws inspiration from human navigation behavior and endows an agent with an active information gathering ability for a more intelligent vision-language navigation policy. To achieve this, we propose an end-to-end framework for learning an exploration policy that decides i) when and where to explore, ii) what information is worth gathering during exploration, and iii) how to adjust the navigation decision after the exploration. The experimental results show promising exploration strategies emerged from training, which leads to significant boost in navigation performance. On the R2R challenge leaderboard, our agent gets promising results all three VLN settings, i.e., single run, pre-exploration, and beam search.

下载PDF全文

下载文献需遵守相关版权规定

论文标题