交替使用MDP自动机

论文标题

交替使用MDP自动机

Alternating Good-for-MDP Automata

论文作者

Hahn, Ernst Moritz, Perez, Mateo, Schewe, Sven, Somenzi, Fabio, Trivedi, Ashutosh, Wojtczak, Dominik

论文摘要

当首先在用于控制MDP的模型增强学习（RL）中首次提出了欧米茄规范目标时，使用确定性的Rabin Automata来尝试提供从其过渡到标量值的直接翻译。尽管这些翻译失败了，但事实证明，可以使用商品MDP（GFM）BüchiAutomata来修复它们。这些是非确定性的Büchi自动机，其非确定性类型有限，尽管不像魔法自动机那样受到限制。实际上，确定性的Rabin Automata具有非常简单的转换为这样的GFM自动机，该GFM自动机在状态和成对的数量中是双线性的。有趣的是，对于确定性的Streett Automata来说，也不能说同样的话：非确定性Rabin或BüchiAutomata的翻译以指数成本的成本也是如此，即使不要求目标自动机为商品。我们需要支付的钱更多才能获得MDP Automaton吗？令人惊讶的答案是，当我们将商品MDP属性扩展到交替的自动机时，我们必须支付的费用要少得多：就像从确定性的Rabin自动机中获得的非确定性GFM自动机一样非确定性Büchi自动机。

When omega-regular objectives were first proposed in model-free reinforcement learning (RL) for controlling MDPs, deterministic Rabin automata were used in an attempt to provide a direct translation from their transitions to scalar values. While these translations failed, it has turned out that it is possible to repair them by using good-for-MDPs (GFM) Büchi automata instead. These are nondeterministic Büchi automata with a restricted type of nondeterminism, albeit not as restricted as in good-for-games automata. Indeed, deterministic Rabin automata have a pretty straightforward translation to such GFM automata, which is bi-linear in the number of states and pairs. Interestingly, the same cannot be said for deterministic Streett automata: a translation to nondeterministic Rabin or Büchi automata comes at an exponential cost, even without requiring the target automaton to be good-for-MDPs. Do we have to pay more than that to obtain a good-for-MDP automaton? The surprising answer is that we have to pay significantly less when we instead expand the good-for-MDP property to alternating automata: like the nondeterministic GFM automata obtained from deterministic Rabin automata, the alternating good-for-MDP automata we produce from deterministic Streett automata are bi-linear in the the size of the deterministic automaton and its index, and can therefore be exponentially more succinct than minimal nondeterministic Büchi automata.

下载PDF全文

下载文献需遵守相关版权规定

论文标题