论文标题
马尔可夫决策过程采用递归风险措施
Markov Decision Processes with Recursive Risk Measures
论文作者
论文摘要
在本文中,我们考虑了对风险敏感的马尔可夫决策过程(MDP),该过程具有Borel状态和行动空间,以及在有限和无限规划范围内的无限成本。我们的最佳标准是基于静态风险度量的递归应用。这是由经济文献中的递归公用事业的促进,此前曾研究过熵风险措施,并在此扩展到适当的风险措施的公理表征。我们得出一个钟声方程,并证明了马尔可夫最佳政策的存在。对于无限的计划范围,该模型被证明是一定程度的,并且是固定的最佳政策。此外,我们建立了与分布鲁棒的MDP的连接,该连接提供了对递归定义的目标函数的全局解释。特别研究单调模型。
In this paper, we consider risk-sensitive Markov Decision Processes (MDPs) with Borel state and action spaces and unbounded cost under both finite and infinite planning horizons. Our optimality criterion is based on the recursive application of static risk measures. This is motivated by recursive utilities in the economic literature, has been studied before for the entropic risk measure and is extended here to an axiomatic characterization of suitable risk measures. We derive a Bellman equation and prove the existence of Markovian optimal policies. For an infinite planning horizon, the model is shown to be contractive and the optimal policy to be stationary. Moreover, we establish a connection to distributionally robust MDPs, which provides a global interpretation of the recursively defined objective function. Monotone models are studied in particular.