论文标题
通过半齿牛顿型方法的镜头(扩展版)的动态编程
Dynamic Programming Through the Lens of Semismooth Newton-Type Methods (Extended Version)
论文作者
论文摘要
政策迭代和价值迭代是许多(近似)动态编程方法的核心。对于具有有限状态和行动空间的马尔可夫决策过程,我们表明它们是牛顿型方法的半齿实例来求解Bellman方程。特别是,我们证明政策迭代等同于确切的半齿牛顿方法,并享受当地二次收敛速度。在控制和运营研究领域的广泛数值证据证实了这一发现,该研究证实,即使政策数量庞大,政策迭代通常几乎不需要迭代才能实现融合。然后,我们显示值迭代是定点迭代方法的实例。本着这种精神,我们开发了一种新颖的当地加速版本的价值迭代版本,并提供全球融合保证和可以忽略不计的额外计算成本。
Policy iteration and value iteration are at the core of many (approximate) dynamic programming methods. For Markov Decision Processes with finite state and action spaces, we show that they are instances of semismooth Newton-type methods to solve the Bellman equation. In particular, we prove that policy iteration is equivalent to the exact semismooth Newton method and enjoys local quadratic convergence rate. This finding is corroborated by extensive numerical evidence in the fields of control and operations research, which confirms that policy iteration generally requires few iterations to achieve convergence even when the number of policies is vast. We then show that value iteration is an instance of the fixed-point iteration method. In this spirit, we develop a novel locally accelerated version of value iteration with global convergence guarantees and negligible extra computational costs.