IPPO：通过改进的近端政策优化，在关节空间中机器人操纵器的障碍

论文标题

IPPO：通过改进的近端政策优化，在关节空间中机器人操纵器的障碍

IPPO: Obstacle Avoidance for Robotic Manipulators in Joint Space via Improved Proximal Policy Optimization

论文作者

Wang, Yongliang, Kasaei, Hamidreza

论文摘要

对于机器人操纵者来说，使用随机目标和障碍达到任务是一项艰巨的任务。在这项研究中，我们提出了一种基于近端政策优化（PPO）的新型无模型增强学习方法，用于训练深层政策，以将任务空间映射到6多型操纵器的联合空间。为了促进大型工作空间中的培训过程，我们开发了环境投入和产出的有效表示。使用基于几何的方法将障碍物和操纵器链接之间距离之间的距离计算纳入状态表示。此外，为了提高模型在达到任务时的性能，我们介绍了动作集合方法并设计了直接参与PPO中价值功能更新的策略。为了克服与现实机器人环境中训练相关的挑战，我们在凉亭开发了模拟环境，以训练模型，因为与其他模拟器相比，它会产生较小的SIM卡对空间隙。但是，凉亭的培训是时必时期的。为了解决这个问题，我们提出了一种SIM卡至SIM方法，以大大减少培训时间。然后，将训练有素的模型直接应用于实体机器人设置中，而无需微调。为了评估所提出方法的性能，我们在模拟机器人和真实机器人中进行了几轮实验。我们还将所提出方法的性能与六个基线进行比较。实验结果证明了所提出的方法在执行有或没有障碍的执行任务时的有效性。在不同的任务方案中，我们的方法优于选定的基线。这些实验的视频已作为补充材料附在纸上。

Reaching tasks with random targets and obstacles is a challenging task for robotic manipulators. In this study, we propose a novel model-free reinforcement learning approach based on proximal policy optimization (PPO) for training a deep policy to map the task space to the joint space of a 6-DoF manipulator. To facilitate the training process in a large workspace, we develop an efficient representation of environmental inputs and outputs. The calculation of the distance between obstacles and manipulator links is incorporated into the state representation using a geometry-based method. Additionally, to enhance the performance of the model in reaching tasks, we introduce the action ensembles method and design the policy to directly participate in value function updates in PPO. To overcome the challenges associated with training in real-robot environments, we develop a simulation environment in Gazebo to train the model as it produces a smaller Sim-to-Real gap compared to other simulators. However, training in Gazebo is time-intensive. To address this issue, we propose a Sim-to-Sim method to significantly reduce the training time. The trained model is then directly applied in a real-robot setup without fine-tuning. To evaluate the performance of the proposed approach, we perform several rounds of experiments in both simulated and real robots. We also compare the performance of the proposed approach with six baselines. The experimental results demonstrate the effectiveness of the proposed method in performing reaching tasks with and without obstacles. our method outperformed the selected baselines by a large margin in different reaching task scenarios. A video of these experiments has been attached to the paper as supplementary material.

下载PDF全文

下载文献需遵守相关版权规定

论文标题