论文标题
使用随机递归梯度的深度Q学习方差降低
Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient
论文作者
论文摘要
深度Q学习算法通常会出现较差的梯度估计,并且差异过高,导致训练不稳定和采样效率不佳。已应用随机方差降低的梯度方法(例如SVRG)来减少估计方差(Zhao等,2019)。但是,由于增强学习的在线实例生成性质,直接将SVRG应用于深Q学习的问题正面临着对锚点的不准确估计的问题,这极大地限制了SVRG的潜力。为了解决这个问题,并受到递归梯度差异算法Sarah的启发(Nguyen等,2017),本文提出了介绍递归框架,以更新深度Q学习中的随机梯度估计,以实现一种称为SRG-DQN的新型算法。与基于SVRG的算法不同,SRG-DQN设计了随机梯度估计值的递归更新。参数更新沿着过去的随机梯度信息沿着累积的方向,因此可以摆脱作为锚定的完整梯度的估计。此外,SRG-DQN涉及ADAM过程,以进一步加速训练过程。理论分析和对知名强化学习任务的实验结果证明了拟议的SRG-DQN算法的效率和有效性。
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance, resulting in unstable training and poor sampling efficiency. Stochastic variance-reduced gradient methods such as SVRG have been applied to reduce the estimation variance (Zhao et al. 2019). However, due to the online instance generation nature of reinforcement learning, directly applying SVRG to deep Q-learning is facing the problem of the inaccurate estimation of the anchor points, which dramatically limits the potentials of SVRG. To address this issue and inspired by the recursive gradient variance reduction algorithm SARAH (Nguyen et al. 2017), this paper proposes to introduce the recursive framework for updating the stochastic gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN. Unlike the SVRG-based algorithms, SRG-DQN designs a recursive update of the stochastic gradient estimate. The parameter update is along an accumulated direction using the past stochastic gradient information, and therefore can get rid of the estimation of the full gradients as the anchors. Additionally, SRG-DQN involves the Adam process for further accelerating the training process. Theoretical analysis and the experimental results on well-known reinforcement learning tasks demonstrate the efficiency and effectiveness of the proposed SRG-DQN algorithm.