强大MDP的可扩展一阶方法

论文标题

强大MDP的可扩展一阶方法

Scalable First-Order Methods for Robust MDPs

论文作者

Grand-Clément, Julien, Kroer, Christian

论文摘要

强大的马尔可夫决策过程（MDP）是通过模型不确定性对顺序决策问题进行建模的强大框架。本文提出了第一个解决鲁棒MDP的一阶框架。我们的算法交织了原始二阶二阶更新，具有近似值迭代更新。通过仔细控制价值迭代更新的准确性和成本之间的权衡，我们达到了$ o \ left（a^{2} s^{3} \ log（s）\ log（s）不确定性集，其中$ s $和$ a $分别是州和行动的数量。我们对状态和动作数量的依赖性要比$ O（a^{1.5} s^{1.5}）$要好得多。在椭圆形不确定性集的数值实验中，我们表明我们的算法比最新方法明显更可扩展。我们的框架也是第一个使用$ s $ retangular kl不确定性集解决强大MDP的框架。

Robust Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision-making problems with model uncertainty. This paper proposes the first first-order framework for solving robust MDPs. Our algorithm interleaves primal-dual first-order updates with approximate Value Iteration updates. By carefully controlling the tradeoff between the accuracy and cost of Value Iteration updates, we achieve an ergodic convergence rate of $O \left( A^{2} S^{3}\log(S)\log(ε^{-1}) ε^{-1} \right)$ for the best choice of parameters on ellipsoidal and Kullback-Leibler $s$-rectangular uncertainty sets, where $S$ and $A$ is the number of states and actions, respectively. Our dependence on the number of states and actions is significantly better (by a factor of $O(A^{1.5}S^{1.5})$) than that of pure Value Iteration algorithms. In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is significantly more scalable than state-of-the-art approaches. Our framework is also the first one to solve robust MDPs with $s$-rectangular KL uncertainty sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题