在增强学习中的样品效果近似的一般框架

论文标题

在增强学习中的样品效果近似的一般框架

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

论文作者

Chen, Zixiang, Li, Chris Junchi, Yuan, Angela, Gu, Quanquan, Jordan, Michael I.

论文摘要

随着处理大型状态和动作空间的越来越多的需求，一般功能近似已成为加强学习（RL）的关键技术。在本文中，我们提出了一个通用框架，该框架统一了基于模型的RL和可允许的Bellman表征（ABC）类，该类别几乎包含了Markov决策过程（MDP）模型，用于可拖动RL。我们提出了一种具有分解结构特性的新型估计函数，以基于优化的探索和功能性远程维度作为ABC类的复杂度度量。在我们的框架下，提出了一种新的样本效率算法，即基于近似（Opera）的基于优化的探索，从而达到了遗憾的界限，以匹配或改进各种MDP模型的最著名结果。特别是，对于具有较低证人等级的MDP，在稍强的假设下，Opera将最新的样本复杂性结果提高了$ DH $。我们的框架提供了一个通用界面来设计和分析新的RL模型和算法。

With the increasing need for handling large state and action spaces, general function approximation has become a key technique in reinforcement learning (RL). In this paper, we propose a general framework that unifies model-based and model-free RL, and an Admissible Bellman Characterization (ABC) class that subsumes nearly all Markov Decision Process (MDP) models in the literature for tractable RL. We propose a novel estimation function with decomposable structural properties for optimization-based exploration and the functional eluder dimension as a complexity measure of the ABC class. Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed, achieving regret bounds that match or improve over the best-known results for a variety of MDP models. In particular, for MDPs with low Witness rank, under a slightly stronger assumption, OPERA improves the state-of-the-art sample complexity results by a factor of $dH$. Our framework provides a generic interface to design and analyze new RL models and algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题