论文标题
车队把角:互动机器人舰队学习,可扩展人类监督
Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human Supervision
论文作者
论文摘要
当机器人处于危险之中或无法取得任务进展时,在亚马逊,敏捷,Waymo和Zoox查询远程人类传统的机器人舰队的商业和工业部署。通过不断学习,随着时间的推移,从遥远的人类池中的干预措施也可以用来改善机器人机队控制政策。一个核心问题是如何有效地分配有限的人类注意力。先前的工作在单机器人的单人类设置中解决了这一点;我们正式化了交互式车队学习(IFL)设置,其中多个机器人可以交互查询并向多个人类主管学习。我们提议将人类努力的回报(ROHE)作为新的度量标准和舰队把角,这是一个IFL算法。我们提出了一个开源的IFL基准套件,包括GPU加速ISAAC健身环境,用于标准化IFL算法的评估和开发。我们将一种新型的机队匕首算法与4个基准和100个机器人进行了比较。我们还使用4个ABB Yumi机器人臂和2个远程人类进行了物理障碍实验。实验表明,人类向机器人的分配显着影响舰队的性能,而新型的舰队匕首算法可以比基线高达8.8倍的ROHE。有关补充材料,请参见https://tinyurl.com/fleet-dagger。
Commercial and industrial deployments of robot fleets at Amazon, Nimble, Plus One, Waymo, and Zoox query remote human teleoperators when robots are at risk or unable to make task progress. With continual learning, interventions from the remote pool of humans can also be used to improve the robot fleet control policy over time. A central question is how to effectively allocate limited human attention. Prior work addresses this in the single-robot, single-human setting; we formalize the Interactive Fleet Learning (IFL) setting, in which multiple robots interactively query and learn from multiple human supervisors. We propose Return on Human Effort (ROHE) as a new metric and Fleet-DAgger, a family of IFL algorithms. We present an open-source IFL benchmark suite of GPU-accelerated Isaac Gym environments for standardized evaluation and development of IFL algorithms. We compare a novel Fleet-DAgger algorithm to 4 baselines with 100 robots in simulation. We also perform a physical block-pushing experiment with 4 ABB YuMi robot arms and 2 remote humans. Experiments suggest that the allocation of humans to robots significantly affects the performance of the fleet, and that the novel Fleet-DAgger algorithm can achieve up to 8.8x higher ROHE than baselines. See https://tinyurl.com/fleet-dagger for supplemental material.