正规化的非政策TD学习

论文标题

正规化的非政策TD学习

Regularized Off-Policy TD-Learning

论文作者

Liu, Bo, Mahadevan, Sridhar, Liu, Ji

论文摘要

我们提出了一种新颖的$ L_1 $正规化的非货币收敛性TD学习方法（称为RO-TD），该方法能够学习具有低计算复杂性的价值函数的稀疏表示。 RO-TD基础的算法框架集成了两个关键思想：诸如TDC之类的非胶囊收敛梯度TD方法和非平滑凸优化的凸 - 孔concove鞍点格式，该方法可以使用在线凸正规化实现一阶求解器和功能选择。提出了RO-TD的详细理论和实验分析。提出了各种实验，以说明RO-TD算法的非政策收敛，稀疏特征选择能力和低计算成本。

We present a novel $l_1$ regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying RO-TD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-point formulation of non-smooth convex optimization, which enables first-order solvers and feature selection using online convex regularization. A detailed theoretical and experimental analysis of RO-TD is presented. A variety of experiments are presented to illustrate the off-policy convergence, sparse feature selection capability and low computational cost of the RO-TD algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题