论文标题
强大的约束MDP:模型不确定性下的软约束稳健策略优化
Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty
论文作者
论文摘要
在本文中,我们专注于关于模型不确定性的强大加强学习(RL)算法的问题。确实,在基于模型的RL的框架中,我们建议将约束马尔可夫决策过程(CMDP)的理论与强大的马尔可夫决策过程(RMDP)合并,从而导致了鲁棒的约束MDP(RCMDP)的制定。本质上,这种表述使我们能够设计性能强大的RL算法,并就系统状态过渡概率的不确定性提供了约束满意度的保证。 RCMPD的需求对于RL的现实应用非常重要。例如,这种表述可以在安全关键应用中从模拟到现实世界(SIM2REAL)的政策转移起重要作用,这将受益于强大的W.R.T模型不确定性的性能和安全保证。我们首先提出了RCMDP概念下的一般问题制定,然后提出了对最佳问题的拉格朗日表述,从而导致了强大的策略梯度RL算法。我们最终在库存管理问题上验证了这一概念。
In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision process (CMDP), with the theory of robust Markov decision process (RMDP), leading to a formulation of robust constrained-MDPs (RCMDP). This formulation, simple in essence, allows us to design RL algorithms that are robust in performance, and provides constraint satisfaction guarantees, with respect to uncertainties in the system's states transition probabilities. The need for RCMPDs is important for real-life applications of RL. For instance, such formulation can play an important role for policy transfer from simulation to real world (Sim2Real) in safety critical applications, which would benefit from performance and safety guarantees which are robust w.r.t model uncertainty. We first propose the general problem formulation under the concept of RCMDP, and then propose a Lagrangian formulation of the optimal problem, leading to a robust-constrained policy gradient RL algorithm. We finally validate this concept on the inventory management problem.