通过深度加强学习，基于视觉的导航和避免障碍

论文标题

通过深度加强学习，基于视觉的导航和避免障碍

Vision-based navigation and obstacle avoidance via deep reinforcement learning

论文作者

Blum, Paul, Crowley, Peter, Lykotrafitis, George

论文摘要

导航算法的开发对于在迅速变化的危险环境中成功部署机器人至关重要，而机器人的配置知识通常受到限制或不可用。不可能使用基于本地化的传统路径规划算法，并且不可能在目标位置进行详细的障碍物图。在这方面，基于视觉的算法具有巨大的希望，因为机器人的板载传感器可以很容易地获取视觉信息，并提供了更丰富的信息来源，深层神经网络可以从中提取复杂的模式。深度强化学习已被用来实现基于视觉的机器人导航。然而，这些算法在具有动态障碍和配置空间的高差异的环境中的功效尚未得到彻底研究。在本文中，我们采用了深层的Dyna-Q学习算法，以基于板载摄像头的低分辨率原始图像数据的部分可观察到的环境，以避免房间疏散和避免障碍物。我们探索在没有障碍物，凸障碍和凹陷障碍物的环境中的机器人剂的性能。在强化学习的每一集开始时，障碍物和出口将在随机位置初始化。总体而言，我们表明我们的算法和培训方法可以推广学习，以无冲突的疏散，以复杂的障碍物配置。显然，代理可以在避免多个静态和动态的障碍的同时导航到目标位置，并且可以在搜索和导航到出口时摆脱凹障碍。

Development of navigation algorithms is essential for the successful deployment of robots in rapidly changing hazardous environments for which prior knowledge of configuration is often limited or unavailable. Use of traditional path-planning algorithms, which are based on localization and require detailed obstacle maps with goal locations, is not possible. In this regard, vision-based algorithms hold great promise, as visual information can be readily acquired by a robot's onboard sensors and provides a much richer source of information from which deep neural networks can extract complex patterns. Deep reinforcement learning has been used to achieve vision-based robot navigation. However, the efficacy of these algorithms in environments with dynamic obstacles and high variation in the configuration space has not been thoroughly investigated. In this paper, we employ a deep Dyna-Q learning algorithm for room evacuation and obstacle avoidance in partially observable environments based on low-resolution raw image data from an onboard camera. We explore the performance of a robotic agent in environments containing no obstacles, convex obstacles, and concave obstacles, both static and dynamic. Obstacles and the exit are initialized in random positions at the start of each episode of reinforcement learning. Overall, we show that our algorithm and training approach can generalize learning for collision-free evacuation of environments with complex obstacle configurations. It is evident that the agent can navigate to a goal location while avoiding multiple static and dynamic obstacles, and can escape from a concave obstacle while searching for and navigating to the exit.

下载PDF全文

下载文献需遵守相关版权规定

论文标题