论文标题
Memristor硬件友好的增强学习
Memristor Hardware-Friendly Reinforcement Learning
论文作者
论文摘要
最近,通过使用增强学习(RL)解决各个领域之间的复杂问题,这使机器或代理可以从与环境的互动中学习,而不是明确的监督。随着摩尔定律的终结似乎是迫在眉睫的,使高性能神经形态硬件系统的新兴技术吸引了越来越多的关注。即,作为硬件神经网络中的突触权重的可编程和非挥发性的两末端设备,利用备忘录的神经形态体系结构是可选的候选人,可以实现这种高能节能和复杂的神经系统。但是,具有集成学习能力的回忆硬件的挑战之一是在学习过程中可能需要的大量写周期,并且在RL情况下甚至会加剧这种情况。在这项工作中,我们为RL中的Actor-Critic算法提出了一个记忆性的神经形态硬件实现。通过引入两倍的培训程序(即,前式培训和原位重新训练)和多种培训技术,可以大大减少重量更新的数量,因此它适合于有效的现场学习实现。作为案例研究,我们考虑平衡倒置的任务,这是RL和控制理论中的经典问题。我们认为,这项研究表明了使用基于Memristor的硬件神经网络通过原位强化学习来处理复杂任务的希望。
Recently, significant progress has been made in solving sophisticated problems among various domains by using reinforcement learning (RL), which allows machines or agents to learn from interactions with environments rather than explicit supervision. As the end of Moore's law seems to be imminent, emerging technologies that enable high performance neuromorphic hardware systems are attracting increasing attention. Namely, neuromorphic architectures that leverage memristors, the programmable and nonvolatile two-terminal devices, as synaptic weights in hardware neural networks, are candidates of choice to realize such highly energy-efficient and complex nervous systems. However, one of the challenges for memristive hardware with integrated learning capabilities is prohibitively large number of write cycles that might be required during learning process, and this situation is even exacerbated under RL situations. In this work we propose a memristive neuromorphic hardware implementation for the actor-critic algorithm in RL. By introducing a two-fold training procedure (i.e., ex-situ pre-training and in-situ re-training) and several training techniques, the number of weight updates can be significantly reduced and thus it will be suitable for efficient in-situ learning implementations. As a case study, we consider the task of balancing an inverted pendulum, a classical problem in both RL and control theory. We believe that this study shows the promise of using memristor-based hardware neural networks for handling complex tasks through in-situ reinforcement learning.