论文标题
基于在线专注内核的时间差异学习
Online Attentive Kernel-Based Temporal Difference Learning
论文作者
论文摘要
随着现实世界中不确定性的增加,在线加强学习(RL)由于其快速学习能力和提高数据效率而引起了人们的关注。但是,在线RL通常患有复杂的值函数近似(VFA)和灾难性干扰,在完全在线环境中将深层神经网络应用于在线RL算法上,这使得很难应用。因此,引入了一种更简单,更自适应的方法,以通过基于内核的模型评估价值函数。稀疏表示在处理干扰方面是优越的,表明与当前稀疏表示方法相比,竞争性稀疏表示应是可学习的,非优先截断和明确的。此外,在学习稀疏表示方面,注意力机制被用来表示稀疏程度,并将光滑的细心功能引入基于内核的VFA中。在本文中,我们提出了一种基于在线核心内核的时间差异(OAKTD)算法,并使用两次量表优化进行了算法,并提供了我们所提出的算法的收敛分析。实验评估表明,OAKTD的表现优于基于在线内核的时间差异(OKTD)学习算法,除了在公共山车,Acrobot,Cartpole和Puddle World Task上使用瓷砖编码的时间差异(TD)学习算法。
With rising uncertainty in the real world, online Reinforcement Learning (RL) has been receiving increasing attention due to its fast learning capability and improving data efficiency. However, online RL often suffers from complex Value Function Approximation (VFA) and catastrophic interference, creating difficulty for the deep neural network to be applied to an online RL algorithm in a fully online setting. Therefore, a simpler and more adaptive approach is introduced to evaluate value function with the kernel-based model. Sparse representations are superior at handling interference, indicating that competitive sparse representations should be learnable, non-prior, non-truncated and explicit when compared with current sparse representation methods. Moreover, in learning sparse representations, attention mechanisms are utilized to represent the degree of sparsification, and a smooth attentive function is introduced into the kernel-based VFA. In this paper, we propose an Online Attentive Kernel-Based Temporal Difference (OAKTD) algorithm using two-timescale optimization and provide convergence analysis of our proposed algorithm. Experimental evaluations showed that OAKTD outperformed several Online Kernel-based Temporal Difference (OKTD) learning algorithms in addition to the Temporal Difference (TD) learning algorithm with Tile Coding on public Mountain Car, Acrobot, CartPole and Puddle World tasks.