论文标题
CVAR的顺序优化
Sequential Optimization of CVaR
论文作者
论文摘要
本文研究了具有有限状态和行动集的折扣总成本决策过程(MDP)对处于风险的条件价值(CVAR)的优化。该CVAR优化问题可以重新构成具有紧凑状态空间的强大MDP(RMDP)。该RMDP中的状态是随着尾巴风险水平增加问题的原始状态,决策者(DM)仅知道初始状态和时间的初始尾巴风险水平。因此,为了在这种方法之后找到最佳策略,DM需要以不完整的状态观测来解决RMDP,因为在第一步之后,DM观察系统状态,但是尾巴风险水平未知。本文表明,对于CVAR优化问题,可以使用凸分析方法来解决相应的RMDP。本文介绍了用于计算和实施最佳CVAR策略的算法,该算法在所有状态下具有完全可观察到的尾巴风险水平的版本的值函数。该算法和本文的主要结果是为了在可能不同的成本函数中优化平均值和CVAR总和的更一般性问题。
This paper studies optimization of the Conditional Value at Risk (CVaR) for a discounted total-cost Markov Decision Process (MDP) with finite state and action sets. This CVaR optimization problem can be reformulated as a Robust MDP(RMDP) with a compact state space. States in this RMDP are the original states of the problems augmented with tail risk levels, and the Decision Maker (DM) knows only the initial tail risk level at the initial state and time. Thus, in order to find an optimal policy following this approach, the DM needs to solve an RMDP with incomplete state observations because after the first move, the DM observes the states of the system, but the tail risk levels are unknown. This paper shows that for the CVaR optimization problem the corresponding RMDP can be solved by using the methods of convex analysis. This paper introduces the algorithm for computing and implementing an optimal CVaR policy by using the value function for the version of this RMDP with completely observable tail risk levels at all states. This algorithm and the major results of the paper are presented for a more general problem of optimization of sum of a mean and CVaR for possibly different cost functions.