加强与神经形态硬件的能节能无映射导航的深度和尖峰神经网络共同学习

论文标题

加强与神经形态硬件的能节能无映射导航的深度和尖峰神经网络共同学习

Reinforcement co-Learning of Deep and Spiking Neural Networks for Energy-Efficient Mapless Navigation with Neuromorphic Hardware

论文作者

Tang, Guangzhi, Kumar, Neelesh, Michmizos, Konstantinos P.

论文摘要

节能无地图导航对于移动机器人探索板载资源有限的未知环境时至关重要。尽管最近的深入强化学习（DRL）方法已成功地应用于导航，但它们的高能量消耗限制了它们在几种机器人应用中的使用。在这里，我们提出了一种神经形态方法，该方法将尖峰神经网络的能源效率与DRL的最佳性结合在一起，并在学习控制策略方面用于无地图导航。我们的混合动力框架是尖峰的深层确定性政策梯度（SDDPG），由尖峰演员网络（SAN）和一个深层评论家网络组成，在那里，使用梯度下降共同培训了两个网络。共学习启用了两个网络之间的协同信息交流，使他们能够通过共享表示的学习克服彼此的局限性。为了评估我们的方法，我们在英特尔的Loihi神经形态处理器上部署了训练有素的SAN。当在模拟和现实世界中的复杂环境中进行验证时，与Jetson TX2上的DDPG相比，我们的Loihi方法消耗的方法减少了75倍，并且还表现出更高的成功导航速率，该目标的范围从1％到4.2％，依赖于前向启动时间段尺寸。这些结果加强了我们持续的努力，以设计具有神经形态硬件的自主机器人的脑启发算法。

Energy-efficient mapless navigation is crucial for mobile robots as they explore unknown environments with limited on-board resources. Although the recent deep reinforcement learning (DRL) approaches have been successfully applied to navigation, their high energy consumption limits their use in several robotic applications. Here, we propose a neuromorphic approach that combines the energy-efficiency of spiking neural networks with the optimality of DRL and benchmark it in learning control policies for mapless navigation. Our hybrid framework, spiking deep deterministic policy gradient (SDDPG), consists of a spiking actor network (SAN) and a deep critic network, where the two networks were trained jointly using gradient descent. The co-learning enabled synergistic information exchange between the two networks, allowing them to overcome each other's limitations through a shared representation learning. To evaluate our approach, we deployed the trained SAN on Intel's Loihi neuromorphic processor. When validated on simulated and real-world complex environments, our method on Loihi consumed 75 times less energy per inference as compared to DDPG on Jetson TX2, and also exhibited a higher rate of successful navigation to the goal, which ranged from 1% to 4.2% and depended on the forward-propagation timestep size. These results reinforce our ongoing efforts to design brain-inspired algorithms for controlling autonomous robots with neuromorphic hardware.

下载PDF全文

下载文献需遵守相关版权规定

论文标题