运动策划者增强了在阻塞环境中机器人操纵的加固学习

论文标题

运动策划者增强了在阻塞环境中机器人操纵的加固学习

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments

论文作者

Yamada, Jun, Lee, Youngwoon, Salhotra, Gautam, Pertsch, Karl, Pflueger, Max, Sukhatme, Gaurav S., Lim, Joseph J., Englert, Peter

论文摘要

深度强化学习（RL）代理人能够通过最大化奖励信号来学习接触丰富的操纵任务，但需要大量的经验，尤其是在具有许多使探索复杂性的障碍的环境中。相比之下，运动计划者使用代理和环境的明确模型来计划无冲突的途径到遥远的目标，但在需要与环境接触的任务中遭受了不准确的模型。为了结合这两种方法的好处，我们建议运动计划者增强RL（MOPA-RL），从而增强了RL代理商的动作空间，并具有运动计划者的长途计划能力。基于动作的大小，我们的方法在直接执行动作和调用运动计划者之间平稳过渡。我们在学习效率和安全性方面评估了各种模拟操纵任务的方法，并将其与替代行动空间进行比较。实验表明，MOPA-RL提高了学习效率，导致更快的探索，并产生更安全的政策，以避免与环境发生冲突。视频和代码可在https://clvrai.com/mopa-rl上找到。

Deep reinforcement learning (RL) agents are able to learn contact-rich manipulation tasks by maximizing a reward signal, but require large amounts of experience, especially in environments with many obstacles that complicate exploration. In contrast, motion planners use explicit models of the agent and environment to plan collision-free paths to faraway goals, but suffer from inaccurate models in tasks that require contacts with the environment. To combine the benefits of both approaches, we propose motion planner augmented RL (MoPA-RL) which augments the action space of an RL agent with the long-horizon planning capabilities of motion planners. Based on the magnitude of the action, our approach smoothly transitions between directly executing the action and invoking a motion planner. We evaluate our approach on various simulated manipulation tasks and compare it to alternative action spaces in terms of learning efficiency and safety. The experiments demonstrate that MoPA-RL increases learning efficiency, leads to a faster exploration, and results in safer policies that avoid collisions with the environment. Videos and code are available at https://clvrai.com/mopa-rl .

下载PDF全文

下载文献需遵守相关版权规定

论文标题