论文标题
离线增强学习在价值和密度比相关性下的实现:空白的力量
Offline Reinforcement Learning Under Value and Density-Ratio Realizability: The Power of Gaps
论文作者
论文摘要
我们考虑了离线加强学习中一个具有挑战性的理论问题(RL):在仅在功能近似器的可靠性型假设下,通过缺乏足够覆盖的数据集获得样本效率保证。尽管现有的理论已经在可实现性和非探索性数据下分别解决了学习,但没有工作能够同时解决这两者(除了我们详细比较的并发工作)。在额外的差距假设下,我们根据边缘化重要性采样(MIS)形成的版本空间为简单的悲观算法提供了保证,并且保证只需要数据来涵盖最佳策略和功能类别以实现最佳值和密度比率函数。虽然在RL理论的其他领域中使用了类似的差距假设,但我们的工作是第一个确定离线RL中差距假设的实用性和新型机制,其功能近似较弱。
We consider a challenging theoretical problem in offline reinforcement learning (RL): obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under only realizability-type assumptions for the function approximators. While the existing theory has addressed learning under realizability and under non-exploratory data separately, no work has been able to address both simultaneously (except for a concurrent work which we compare in detail). Under an additional gap assumption, we provide guarantees to a simple pessimistic algorithm based on a version space formed by marginalized importance sampling (MIS), and the guarantee only requires the data to cover the optimal policy and the function classes to realize the optimal value and density-ratio functions. While similar gap assumptions have been used in other areas of RL theory, our work is the first to identify the utility and the novel mechanism of gap assumptions in offline RL with weak function approximation.