保护政策转移

论文标题

保护政策转移

Protective Policy Transfer

论文作者

Yu, Wenhao, Liu, C. Karen, Turk, Greg

论文摘要

当训练机器人在不可预测的现实世界环境中运行时，能够将现有技能转移到新情况下是一个关键能力。成功的转移算法不仅应最大程度地减少机器人在新环境中需要收集的样品数量，而且还可以防止机器人在转移过程中损坏自身或周围环境。在这项工作中，我们介绍了一种政策转移算法，以使机器人运动技能适应新颖的情况，同时最大程度地减少严重的失败。我们的算法在培训环境中训练两项控制政策：一项旨在完成感兴趣任务的任务政策，以及一项专门用于使机器人免于不安全事件的保护政策（例如，落入了地面）。为了决定执行期间要使用的策略，我们在培训环境中学习了一个安全估计器模型，该模型估计了机器人的连续安全水平。当与一组阈值一起使用时，安全估计器将成为保护策略和任务策略之间切换的分类器。我们评估了四个模拟机器人运动问题和2D导航问题的方法，并表明我们的方法可以成功地转移到不同的环境，同时考虑机器人的安全性。

Being able to transfer existing skills to new situations is a key capability when training robots to operate in unpredictable real-world environments. A successful transfer algorithm should not only minimize the number of samples that the robot needs to collect in the new environment, but also prevent the robot from damaging itself or the surrounding environment during the transfer process. In this work, we introduce a policy transfer algorithm for adapting robot motor skills to novel scenarios while minimizing serious failures. Our algorithm trains two control policies in the training environment: a task policy that is optimized to complete the task of interest, and a protective policy that is dedicated to keep the robot from unsafe events (e.g. falling to the ground). To decide which policy to use during execution, we learn a safety estimator model in the training environment that estimates a continuous safety level of the robot. When used with a set of thresholds, the safety estimator becomes a classifier for switching between the protective policy and the task policy. We evaluate our approach on four simulated robot locomotion problems and a 2D navigation problem and show that our method can achieve successful transfer to notably different environments while taking the robot's safety into consideration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题