引发多人类模仿学习的兼容演示

论文标题

引发多人类模仿学习的兼容演示

Eliciting Compatible Demonstrations for Multi-Human Imitation Learning

论文作者

Gandhi, Kanishk, Karamcheti, Siddharth, Liao, Madeline, Sadigh, Dorsa

论文摘要

从人提供的示威中学习的模仿是学习机器人操纵政策的强大方法。虽然用于模仿学习的理想数据集是同质且低变化的 - 反映了执行任务的一种最佳方法，但自然的人类行为具有很大的异质性，并采用了几种最佳方法来证明任务。这种多模式对人类用户无关紧要，任务变化表现为潜意识的选择。例如，伸手向下，然后抓住一个物体，而不是伸手，然后向下。然而，这种不匹配给互动模仿学习带来了一个问题，其中用户的序列通过迭代收集新的，可能是冲突的示范来改善政策。为了解决这一示威者不兼容的问题，这项工作设计了一种方法，用于1）测量给定基本策略的新演示的兼容性，以及2）积极吸引新用户的更兼容的演示。在两项需要长途操作，灵巧的操纵和使用Franka Emika Panda手臂的现实“食品镀”任务的模拟任务中，我们表明我们都可以通过事后过滤确定不兼容的演示，并应用我们的兼容性措施，以从新用户中实现兼容的兼容措施，从而从新用户中兼容兼容，从而提高了跨越跨越的任务跨越跨越的工作环境，从而使模拟环境跨越了模拟环境。

Imitation learning from human-provided demonstrations is a strong approach for learning policies for robot manipulation. While the ideal dataset for imitation learning is homogenous and low-variance -- reflecting a single, optimal method for performing a task -- natural human behavior has a great deal of heterogeneity, with several optimal ways to demonstrate a task. This multimodality is inconsequential to human users, with task variations manifesting as subconscious choices; for example, reaching down, then across to grasp an object, versus reaching across, then down. Yet, this mismatch presents a problem for interactive imitation learning, where sequences of users improve on a policy by iteratively collecting new, possibly conflicting demonstrations. To combat this problem of demonstrator incompatibility, this work designs an approach for 1) measuring the compatibility of a new demonstration given a base policy, and 2) actively eliciting more compatible demonstrations from new users. Across two simulation tasks requiring long-horizon, dexterous manipulation and a real-world "food plating" task with a Franka Emika Panda arm, we show that we can both identify incompatible demonstrations via post-hoc filtering, and apply our compatibility measure to actively elicit compatible demonstrations from new users, leading to improved task success rates across simulated and real environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题