论文标题
POMDPS中的历史依赖性评估
History-dependent evaluations in POMDPs
论文作者
论文摘要
我们考虑pomdps,其中阶段回报的重量取决于过去的信号和无限重复问题中发生的动作的顺序。我们证明,对于所有epsilon> 0,存在一种策略,对于满足“决策者足够耐心的耐心”的任何权重的任何一系列权重时,它是最佳选择的策略。这统一并概括了文献的几个结果,并特别适用于具有Limsup回报的POMDP。
We consider POMDPs in which the weight of the stage payoff depends on the past sequence of signals and actions occurring in the infinitely repeated problem. We prove that for all epsilon>0, there exists a strategy that is epsilon-optimal for any sequence of weights satisfying a property that interprets as "the decision-maker is patient enough". This unifies and generalizes several results of the literature, and applies notably to POMDPs with limsup payoffs.