正规化马尔可夫决策过程的加速原始偶对偶的方法

论文标题

正规化马尔可夫决策过程的加速原始偶对偶的方法

Accelerating Primal-dual Methods for Regularized Markov Decision Processes

论文作者

Li, Haoya, Yu, Hsiang-fu, Ying, Lexing, Dhillon, Inderjit

论文摘要

熵正规的马尔可夫决策过程已被广泛用于增强学习。本文涉及熵正规化问题的原始二重式表述。由于缺乏严格的凸度和凹度，标准的一阶方法的收敛缓慢。为了解决这个问题，我们首先引入了一个新的四四凸出原始偶偶式配方。新配方的自然梯度上升质量是全球收敛保证和指数收敛速度。我们还提出了一种新的插值度量，以进一步加速收敛。提供数值结果以证明在多个设置下提出的方法的性能。

Entropy regularized Markov decision processes have been widely used in reinforcement learning. This paper is concerned with the primal-dual formulation of the entropy regularized problems. Standard first-order methods suffer from slow convergence due to the lack of strict convexity and concavity. To address this issue, we first introduce a new quadratically convexified primal-dual formulation. The natural gradient ascent descent of the new formulation enjoys global convergence guarantee and exponential convergence rate. We also propose a new interpolating metric that further accelerates the convergence significantly. Numerical results are provided to demonstrate the performance of the proposed methods under multiple settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题