论文标题

通过反馈调制的TD-STDP进行加强学习

Reinforcement Learning with Feedback-modulated TD-STDP

论文作者

Chung, Stephen, Kozma, Robert

论文摘要

尖峰神经元网络已成功地用于解决简单的增强学习任务,连续操作集以基于峰值依赖性可塑性(STDP)应用学习规则的连续动作集。但是,这些模型中的大多数不能应用于具有离散动作设置的增强学习任务,因为它们假设所选动作是神经元发射速率的确定性函数,这是连续的。在本文中,我们提出了一项新的基于STDP的学习规则,用于尖峰神经元网络,其中包含反馈调制。我们表明,基于STDP的学习规则可用于以与标准强化学习算法相似的速度设置的离散操作来求解加强学习任务,当时应用于Cartpole和Lunarlander任务。此外,我们证明,如果从学习规则中省略了反馈调制,则代理将无法解决这些任务。我们得出的结论是,反馈调制只有为执行操作做出贡献的单位,而TD错误参与学习时,可以提供更好的信用分配。

Spiking neuron networks have been used successfully to solve simple reinforcement learning tasks with continuous action set applying learning rules based on spike-timing-dependent plasticity (STDP). However, most of these models cannot be applied to reinforcement learning tasks with discrete action set since they assume that the selected action is a deterministic function of firing rate of neurons, which is continuous. In this paper, we propose a new STDP-based learning rule for spiking neuron networks which contains feedback modulation. We show that the STDP-based learning rule can be used to solve reinforcement learning tasks with discrete action set at a speed similar to standard reinforcement learning algorithms when applied to the CartPole and LunarLander tasks. Moreover, we demonstrate that the agent is unable to solve these tasks if feedback modulation is omitted from the learning rule. We conclude that feedback modulation allows better credit assignment when only the units contributing to the executed action and TD error participate in learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源