可计数MDP的瞬变

论文标题

可计数MDP的瞬变

Transience in Countable MDPs

论文作者

Kiefer, Stefan, Mayr, Richard, Shirmohammadi, Mahsa, Totzke, Patrick

论文摘要

瞬态目标是不要经常访问任何州。尽管这在有限马尔可夫决策过程（MDP）中是不可能的，但可以在无限的无限的决策过程中满足，例如，如果过渡图是无环。我们证明了瞬时无限MDP中瞬时的基本特性。 1。即使在无限分支的MDP中，也存在均匀的$ε$ - 最佳MD策略（无内存确定性）。 2。即使MDP有限分支，瞬态的最佳策略也不需要。但是，如果存在最佳策略，则还具有最佳的MD策略。 3。如果MDP是普遍短暂的（即，在所有策略下几乎肯定是短暂的），那么许多其他目标的策略复杂性比一般MDP的策略复杂性低。例如，即使MDP是无限分支，也可以选择MD的$ \ {0,1,2 \} $ - $ \ {0,1,2 \} $ - $ \ {0,1,2 \} $的最佳策略，即使MDP无限分支也可以选择MD。

The Transience objective is not to visit any state infinitely often. While this is not possible in finite Markov Decision Process (MDP), it can be satisfied in countably infinite ones, e.g., if the transition graph is acyclic. We prove the following fundamental properties of Transience in countably infinite MDPs. 1. There exist uniformly $ε$-optimal MD strategies (memoryless deterministic) for Transience, even in infinitely branching MDPs. 2. Optimal strategies for Transience need not exist, even if the MDP is finitely branching. However, if an optimal strategy exists then there is also an optimal MD strategy. 3. If an MDP is universally transient (i.e., almost surely transient under all strategies) then many other objectives have a lower strategy complexity than in general MDPs. E.g., $ε$-optimal strategies for Safety and co-Büchi and optimal strategies for $\{0,1,2\}$-Parity (where they exist) can be chosen MD, even if the MDP is infinitely branching.

下载PDF全文

下载文献需遵守相关版权规定

论文标题