论文标题
“什么,不是如何”:从头开始求解插入不足的插入任务
"What, not how": Solving an under-actuated insertion task from scratch
论文作者
论文摘要
机器人操纵需要一组复杂的技能,需要仔细组合和协调以解决任务。然而,在机器人学研究任务中,大多数增强性学习(RL)实际上仅由单个操纵技能组成,例如抓住对象或插入预抓紧物体。结果,技能(“如何解决”任务),但没有指定完整操纵的实际目标('什么'要解决)。相比之下,我们研究了一个复杂的操纵目标,需要代理商学习和结合多种操纵技巧。我们提出了一项具有挑战性的,高度不足的钉孔任务,并具有自由旋转的不对称钉,需要广泛的操纵技巧。虽然正确的钉(重新)取向是成功插入的必要条件,但没有奖励。因此,代理需要了解这一前提并学习实现它的技能。最终的插入奖励是稀疏的,可以在解决方案中自由,并导致任务设计期间没有设想出复杂的新兴行为。我们使用计划的辅助控制(SAC-X)与正则化层次策略优化(RHPO)在多任务RL框架中解决该问题,该策略优化(RHPO)成功地解决了模拟中的任务,并在单个机器人上从划痕中从严格限制的单个机器人上进行划痕。
Robot manipulation requires a complex set of skills that need to be carefully combined and coordinated to solve a task. Yet, most ReinforcementLearning (RL) approaches in robotics study tasks which actually consist only of a single manipulation skill, such as grasping an object or inserting a pre-grasped object. As a result the skill ('how' to solve the task) but not the actual goal of a complete manipulation ('what' to solve) is specified. In contrast, we study a complex manipulation goal that requires an agent to learn and combine diverse manipulation skills. We propose a challenging, highly under-actuated peg-in-hole task with a free, rotational asymmetrical peg, requiring a broad range of manipulation skills. While correct peg (re-)orientation is a requirement for successful insertion, there is no reward associated with it. Hence an agent needs to understand this pre-condition and learn the skill to fulfil it. The final insertion reward is sparse, allowing freedom in the solution and leading to complex emerging behaviour not envisioned during the task design. We tackle the problem in a multi-task RL framework using Scheduled Auxiliary Control (SAC-X) combined with Regularized Hierarchical Policy Optimization (RHPO) which successfully solves the task in simulation and from scratch on a single robot where data is severely limited.