从3D OCTREE观察中学习掌握月球，并进行深度加固学习

论文标题

从3D OCTREE观察中学习掌握月球，并进行深度加固学习

Learning to Grasp on the Moon from 3D Octree Observations with Deep Reinforcement Learning

论文作者

Orsula, Andrej, Bøgh, Simon, Olivares-Mendez, Miguel, Martinez, Carol

论文摘要

具有通用机器人臂的外星漫游者在月球和行星勘探中具有许多潜在的应用。需要将自主权引入此类系统是可以增加流浪者花在收集科学数据并收集样本的时间的情况下。这项工作调查了深钢筋学习对月球上对象的基于视觉的机器人抓握的适用性。创建了一个具有程序生成数据集的新型仿真环境，以在具有不平衡的地形和苛刻照明的非结构化场景中训练代理。然后采用无模型的非政治演员 - 批评算法来端到端学习，该策略将紧凑的OCTREE观察结果直接映射到笛卡尔空间中的连续行动。实验评估表明，与传统使用的基于图像的观测值相比，3D数据表示可以更有效地学习操纵技能。域随机化改善了以前看不见的物体和不同照明条件的新场景的学术策略的概括。为此，我们通过评估月球障碍设施中的真实机器人上的训练有素的代理来展示零射击的SIM到现实转移。

Extraterrestrial rovers with a general-purpose robotic arm have many potential applications in lunar and planetary exploration. Introducing autonomy into such systems is desirable for increasing the time that rovers can spend gathering scientific data and collecting samples. This work investigates the applicability of deep reinforcement learning for vision-based robotic grasping of objects on the Moon. A novel simulation environment with procedurally-generated datasets is created to train agents under challenging conditions in unstructured scenes with uneven terrain and harsh illumination. A model-free off-policy actor-critic algorithm is then employed for end-to-end learning of a policy that directly maps compact octree observations to continuous actions in Cartesian space. Experimental evaluation indicates that 3D data representations enable more effective learning of manipulation skills when compared to traditionally used image-based observations. Domain randomization improves the generalization of learned policies to novel scenes with previously unseen objects and different illumination conditions. To this end, we demonstrate zero-shot sim-to-real transfer by evaluating trained agents on a real robot in a Moon-analogue facility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题