论文标题
通过梯度分裂缩小SVRG和TD-SVRG之间的差距
Closing the gap between SVRG and TD-SVRG with Gradient Splitting
论文作者
论文摘要
时间差异(TD)学习是一种在加强学习中的政策评估,可以通过降低方差方法来提高其性能。最近,多项工作试图将TD学习与随机方差降低梯度(SVRG)方法融合,以达到收敛的几何速率。但是,在凸优化的情况下,所得的收敛速率明显弱于SVRG所实现的速度。在这项工作中,我们利用对TD学习的最新解释作为适当选择功能的梯度的分裂,从而简化了算法并将TD与SVRG融合。我们的主要结果是几何融合,预定的学习率为$ 1/8 $,这与凸面设置中可用于SVRG的收敛相同。我们的理论发现得到了一组实验的支持。
Temporal difference (TD) learning is a policy evaluation in reinforcement learning whose performance can be enhanced by variance reduction methods. Recently, multiple works have sought to fuse TD learning with Stochastic Variance Reduced Gradient (SVRG) method to achieve a geometric rate of convergence. However, the resulting convergence rate is significantly weaker than what is achieved by SVRG in the setting of convex optimization. In this work we utilize a recent interpretation of TD-learning as the splitting of the gradient of an appropriately chosen function, thus simplifying the algorithm and fusing TD with SVRG. Our main result is a geometric convergence bound with predetermined learning rate of $1/8$, which is identical to the convergence bound available for SVRG in the convex setting. Our theoretical findings are supported by a set of experiments.