在基于模型的强化学习中，利率延伸理论与价值等价之间

论文标题

在基于模型的强化学习中，利率延伸理论与价值等价之间

Between Rate-Distortion Theory & Value Equivalence in Model-Based Reinforcement Learning

论文作者

Arumugam, Dilip, Van Roy, Benjamin

论文摘要

基于典型的模型的增强式学习代理迭代地完善了其对环境真正基础模型的估计或先前的信念。然而，在基于模型的强化学习中，最新的经验成功是通过功能近似进行的，避免了真正的模型，而有利于替代物，虽然忽略了环境的各个方面，但仍然促进了对行为的有效计划。最近正式化为价值等效原理，这种算法技术可能不可避免，因为现实世界加强学习需要考虑一个简单的，计算中的代理，与压倒性的复杂环境相互作用。在这项工作中，我们会播放一种极端的情况，其中，巨大的环境复杂性和有限的代理能力的某种组合完全无法确定确切的价值等效模型。鉴于此，我们包含一个近似值等效性的概念，并引入了一种算法，以逐步合成周围环境的简单近似值，代理仍然可以从中恢复近乎最佳的行为。至关重要的是，我们认识到这种有损环境压缩问题的信息理论性质，并使用适当的速率延伸理论工具来使数学上的价值等价能够赋予易用性，以使其具有其他棘手的顺序决策问题。

The quintessential model-based reinforcement-learning agent iteratively refines its estimates or prior beliefs about the true underlying model of the environment. Recent empirical successes in model-based reinforcement learning with function approximation, however, eschew the true model in favor of a surrogate that, while ignoring various facets of the environment, still facilitates effective planning over behaviors. Recently formalized as the value equivalence principle, this algorithmic technique is perhaps unavoidable as real-world reinforcement learning demands consideration of a simple, computationally-bounded agent interacting with an overwhelmingly complex environment. In this work, we entertain an extreme scenario wherein some combination of immense environment complexity and limited agent capacity entirely precludes identifying an exactly value-equivalent model. In light of this, we embrace a notion of approximate value equivalence and introduce an algorithm for incrementally synthesizing simple and useful approximations of the environment from which an agent might still recover near-optimal behavior. Crucially, we recognize the information-theoretic nature of this lossy environment compression problem and use the appropriate tools of rate-distortion theory to make mathematically precise how value equivalence can lend tractability to otherwise intractable sequential decision-making problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题