论文标题
机器人操纵的主动感知和表示
Active Perception and Representation for Robotic Manipulation
论文作者
论文摘要
绝大多数视觉动物积极控制他们的眼睛,头部和/或身体,将目光引导到环境的不同部分。相比之下,在机器人操作中增强学习的最新应用采用相机作为被动传感器。这些被小心地放置以从固定姿势查看场景。积极的感知使动物可以收集有关世界的最相关信息,并在需要时集中其计算资源。它还使他们能够从不同的距离和观点查看对象,从而提供丰富的视觉体验,从而从中学习了对环境的抽象表示。受灵长类动物视觉运动系统的启发,我们提出了一个框架,该框架利用主动感知的好处来完成操纵任务。我们的代理使用观点更改来本地化对象,以自我监督的方式学习状态表示,并执行目标指导的动作。我们将模型应用于具有6-DOF动作空间的模拟掌握任务。与其被动固定相机对应物相比,活动模型在目标握把中的性能提高了8%。与香草深Q学习算法相比,我们的模型至少要高出四倍的样品效率,强调了主动感知和表示学习的益处。
The vast majority of visual animals actively control their eyes, heads, and/or bodies to direct their gaze toward different parts of their environment. In contrast, recent applications of reinforcement learning in robotic manipulation employ cameras as passive sensors. These are carefully placed to view a scene from a fixed pose. Active perception allows animals to gather the most relevant information about the world and focus their computational resources where needed. It also enables them to view objects from different distances and viewpoints, providing a rich visual experience from which to learn abstract representations of the environment. Inspired by the primate visual-motor system, we present a framework that leverages the benefits of active perception to accomplish manipulation tasks. Our agent uses viewpoint changes to localize objects, to learn state representations in a self-supervised manner, and to perform goal-directed actions. We apply our model to a simulated grasping task with a 6-DoF action space. Compared to its passive, fixed-camera counterpart, the active model achieves 8% better performance in targeted grasping. Compared to vanilla deep Q-learning algorithms, our model is at least four times more sample-efficient, highlighting the benefits of both active perception and representation learning.