论文标题

从本体感受到新颖环境中的长胜态计划:分层RL模型

From proprioception to long-horizon planning in novel environments: A hierarchical RL model

论文作者

Gothoskar, Nishad, Lázaro-Gredilla, Miguel, George, Dileep

论文摘要

为了使智能代理在复杂的环境中灵活有效地操作,他们必须能够在多个级别,空间和概念抽象的层次上进行推理。在较低层次上,代理必须解释其本体感受的输入并控制其肌肉,在较高层次上,代理必须选择目标并计划如何实现这些目标。显然,这些类型的推理都可以适合不同类型的表示,算法和输入。在这项工作中,我们引入了一种简单的三级层次结构,反映了这些区别。低级控制器使用无模型学习来获得有用的行为,在连续的本体感受输入上运行。这些反过来又引起了一组中级动力学,这些动力学是由中级控制器学习的,并用于模型预测性控制,以选择在每个时间步中激活的行为。高级控制器利用离散的图表表示目标选择和路径计划来指定中级控制器的目标。我们将方法应用于Mujoco Ant环境中的一系列导航任务,与先前的无模型,基于模型和分层的RL方法相比,样品效率的显着提高。最后,作为我们建筑优势的说明性示例,我们将方法应用于需要有效探索和长途计划的复杂迷宫环境中。

For an intelligent agent to flexibly and efficiently operate in complex environments, they must be able to reason at multiple levels of temporal, spatial, and conceptual abstraction. At the lower levels, the agent must interpret their proprioceptive inputs and control their muscles, and at the higher levels, the agent must select goals and plan how they will achieve those goals. It is clear that each of these types of reasoning is amenable to different types of representations, algorithms, and inputs. In this work, we introduce a simple, three-level hierarchical architecture that reflects these distinctions. The low-level controller operates on the continuous proprioceptive inputs, using model-free learning to acquire useful behaviors. These in turn induce a set of mid-level dynamics, which are learned by the mid-level controller and used for model-predictive control, to select a behavior to activate at each timestep. The high-level controller leverages a discrete, graph representation for goal selection and path planning to specify targets for the mid-level controller. We apply our method to a series of navigation tasks in the Mujoco Ant environment, consistently demonstrating significant improvements in sample-efficiency compared to prior model-free, model-based, and hierarchical RL methods. Finally, as an illustrative example of the advantages of our architecture, we apply our method to a complex maze environment that requires efficient exploration and long-horizon planning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源