论文标题
强大MDP的可扩展一阶方法
Scalable First-Order Methods for Robust MDPs
论文作者
论文摘要
强大的马尔可夫决策过程(MDP)是通过模型不确定性对顺序决策问题进行建模的强大框架。本文提出了第一个解决鲁棒MDP的一阶框架。我们的算法交织了原始二阶二阶更新,具有近似值迭代更新。通过仔细控制价值迭代更新的准确性和成本之间的权衡,我们达到了$ o \ left(a^{2} s^{3} \ log(s)\ log(s)不确定性集,其中$ s $和$ a $分别是州和行动的数量。我们对状态和动作数量的依赖性要比$ O(a^{1.5} s^{1.5})$要好得多。在椭圆形不确定性集的数值实验中,我们表明我们的算法比最新方法明显更可扩展。我们的框架也是第一个使用$ s $ retangular kl不确定性集解决强大MDP的框架。
Robust Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision-making problems with model uncertainty. This paper proposes the first first-order framework for solving robust MDPs. Our algorithm interleaves primal-dual first-order updates with approximate Value Iteration updates. By carefully controlling the tradeoff between the accuracy and cost of Value Iteration updates, we achieve an ergodic convergence rate of $O \left( A^{2} S^{3}\log(S)\log(ε^{-1}) ε^{-1} \right)$ for the best choice of parameters on ellipsoidal and Kullback-Leibler $s$-rectangular uncertainty sets, where $S$ and $A$ is the number of states and actions, respectively. Our dependence on the number of states and actions is significantly better (by a factor of $O(A^{1.5}S^{1.5})$) than that of pure Value Iteration algorithms. In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is significantly more scalable than state-of-the-art approaches. Our framework is also the first one to solve robust MDPs with $s$-rectangular KL uncertainty sets.