论文标题
引发多人类模仿学习的兼容演示
Eliciting Compatible Demonstrations for Multi-Human Imitation Learning
论文作者
论文摘要
从人提供的示威中学习的模仿是学习机器人操纵政策的强大方法。虽然用于模仿学习的理想数据集是同质且低变化的 - 反映了执行任务的一种最佳方法,但自然的人类行为具有很大的异质性,并采用了几种最佳方法来证明任务。这种多模式对人类用户无关紧要,任务变化表现为潜意识的选择。例如,伸手向下,然后抓住一个物体,而不是伸手,然后向下。然而,这种不匹配给互动模仿学习带来了一个问题,其中用户的序列通过迭代收集新的,可能是冲突的示范来改善政策。为了解决这一示威者不兼容的问题,这项工作设计了一种方法,用于1)测量给定基本策略的新演示的兼容性,以及2)积极吸引新用户的更兼容的演示。在两项需要长途操作,灵巧的操纵和使用Franka Emika Panda手臂的现实“食品镀”任务的模拟任务中,我们表明我们都可以通过事后过滤确定不兼容的演示,并应用我们的兼容性措施,以从新用户中实现兼容的兼容措施,从而从新用户中兼容兼容,从而提高了跨越跨越的任务跨越跨越的工作环境,从而使模拟环境跨越了模拟环境。
Imitation learning from human-provided demonstrations is a strong approach for learning policies for robot manipulation. While the ideal dataset for imitation learning is homogenous and low-variance -- reflecting a single, optimal method for performing a task -- natural human behavior has a great deal of heterogeneity, with several optimal ways to demonstrate a task. This multimodality is inconsequential to human users, with task variations manifesting as subconscious choices; for example, reaching down, then across to grasp an object, versus reaching across, then down. Yet, this mismatch presents a problem for interactive imitation learning, where sequences of users improve on a policy by iteratively collecting new, possibly conflicting demonstrations. To combat this problem of demonstrator incompatibility, this work designs an approach for 1) measuring the compatibility of a new demonstration given a base policy, and 2) actively eliciting more compatible demonstrations from new users. Across two simulation tasks requiring long-horizon, dexterous manipulation and a real-world "food plating" task with a Franka Emika Panda arm, we show that we can both identify incompatible demonstrations via post-hoc filtering, and apply our compatibility measure to actively elicit compatible demonstrations from new users, leading to improved task success rates across simulated and real environments.