使用远程遥控的人类模仿学习

论文标题

使用远程遥控的人类模仿学习

Human-in-the-Loop Imitation Learning using Remote Teleoperation

论文作者

Mandlekar, Ajay, Xu, Danfei, Martín-Martín, Roberto, Zhu, Yuke, Fei-Fei, Li, Savarese, Silvio

论文摘要

模仿学习是通过从人类示范中重现行为来学习复杂机器人操纵技巧的有希望的范式。但是，操纵任务通常包含需要一系列精确动作以取得有意义进步的瓶颈区域，例如将豆荚插入咖啡机中制作咖啡的机器人。训练有素的政策在这些地区可能会失败，因为行动的小偏差可以将政策带入示威不涵盖的国家。基于干预的政策学习是可以解决此问题的替代方法 - 它允许人类操作员在遇到失败时监视训练有素的政策并接管控制。在本文中，我们构建了一个针对6多种操纵设置的数据收集系统，使远程人类操作员可以监视和干预训练有素的政策。我们开发了一种简单有效的算法，以对系统收集的新数据进行迭代训练，以鼓励政策学习如何通过干预措施进行瓶颈。我们证明，对基于干预的系统和算法收集的数据进行培训的代理商优于接受非惯用示威者收集的等效样本培训的代理商，并进一步表明，我们的方法超过了多个最先进的底线，以优于从人体干预中学习的多个最先进的底线，这些底层是在人工干预中学习的，这是一项有具有挑战性的机器人穿线螺纹和咖啡的任务和制造咖啡的任务。 https://sites.google.com/stanford.edu/iwr上的其他结果和视频。

Imitation Learning is a promising paradigm for learning complex robot manipulation skills by reproducing behavior from human demonstrations. However, manipulation tasks often contain bottleneck regions that require a sequence of precise actions to make meaningful progress, such as a robot inserting a pod into a coffee machine to make coffee. Trained policies can fail in these regions because small deviations in actions can lead the policy into states not covered by the demonstrations. Intervention-based policy learning is an alternative that can address this issue -- it allows human operators to monitor trained policies and take over control when they encounter failures. In this paper, we build a data collection system tailored to 6-DoF manipulation settings, that enables remote human operators to monitor and intervene on trained policies. We develop a simple and effective algorithm to train the policy iteratively on new data collected by the system that encourages the policy to learn how to traverse bottlenecks through the interventions. We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators, and further show that our method outperforms multiple state-of-the-art baselines for learning from the human interventions on a challenging robot threading task and a coffee making task. Additional results and videos at https://sites.google.com/stanford.edu/iwr .

下载PDF全文

下载文献需遵守相关版权规定

论文标题