论文标题
新颖环境中的快速任务解决
Rapid Task-Solving in Novel Environments
论文作者
论文摘要
我们提出了在新型环境(RTS)中快速解决任务解决的挑战,其中代理必须在陌生的环境中尽快解决一系列任务。有效的RTS代理必须在探索陌生的环境和解决其当前任务之间取得平衡,同时建立一个新环境模型,在面对以后的任务时可以计划的模型。虽然现代的深入RL代理孤立地表现出其中一些能力,但没有一个适合完整的RTS挑战。为了使RTS取得进展,我们介绍了两个挑战域:(1)最小的RTS挑战称为内存与计划游戏,以及(2)一击街道导航,该导航引入了现实世界中的规模和复杂性。我们证明,最先进的深度RL代理在两个领域的RTS中都失败了,并且这种失败是由于无法计划收集的知识。我们开发了情节计划网络(EPN),并表明具有EPNS Excel的Deep-RL代理在RTS上表现出色,超过了2-3的因素,并学会了在单个情节中浏览持有的Streetlearn Maps。我们表明,EPN学会了执行类似价值的迭代式计划算法,并且他们将其推广到超越培训经验的情况。算法,他们将其推广到超越培训经验的情况。
We propose the challenge of rapid task-solving in novel environments (RTS), wherein an agent must solve a series of tasks as rapidly as possible in an unfamiliar environment. An effective RTS agent must balance between exploring the unfamiliar environment and solving its current task, all while building a model of the new environment over which it can plan when faced with later tasks. While modern deep RL agents exhibit some of these abilities in isolation, none are suitable for the full RTS challenge. To enable progress toward RTS, we introduce two challenge domains: (1) a minimal RTS challenge called the Memory&Planning Game and (2) One-Shot StreetLearn Navigation, which introduces scale and complexity from real-world data. We demonstrate that state-of-the-art deep RL agents fail at RTS in both domains, and that this failure is due to an inability to plan over gathered knowledge. We develop Episodic Planning Networks (EPNs) and show that deep-RL agents with EPNs excel at RTS, outperforming the nearest baseline by factors of 2-3 and learning to navigate held-out StreetLearn maps within a single episode. We show that EPNs learn to execute a value iteration-like planning algorithm and that they generalize to situations beyond their training experience. algorithm and that they generalize to situations beyond their training experience.