统一的脱机模型培训和政策学习的统一框架

论文标题

统一的脱机模型培训和政策学习的统一框架

A Unified Framework for Alternating Offline Model Training and Policy Learning

论文作者

Yang, Shentao, Zhang, Shujian, Feng, Yihao, Zhou, Mingyuan

论文摘要

在基于离线模型的强化学习（离线MBRL）中，我们从历史上收集的数据中学习了动态模型，随后将学习的模型和固定数据集用于政策学习，而无需与环境进行进一步的互动。离线MBRL算法可以提高政策学习对无模型算法的效率和稳定性。但是，在大多数现有的离线MBRL算法中，动态模型和策略的学习目标彼此隔离。这样的客观不匹配可能会导致学习的药物的劣势。在本文中，我们通过开发一个迭代的离线MBR框架来解决此问题，在该框架中，我们通过在动态模型培训和政策学习之间交替，在该迭代中最大程度地提高了真实预期收益的下限。借助拟议的统一模型学习框架，我们在各种连续控制的离线增强学习数据集上实现了竞争性能。源代码已公开发布。

In offline model-based reinforcement learning (offline MBRL), we learn a dynamic model from historically collected data, and subsequently utilize the learned model and fixed datasets for policy learning, without further interacting with the environment. Offline MBRL algorithms can improve the efficiency and stability of policy learning over the model-free algorithms. However, in most of the existing offline MBRL algorithms, the learning objectives for the dynamic models and the policies are isolated from each other. Such an objective mismatch may lead to inferior performance of the learned agents. In this paper, we address this issue by developing an iterative offline MBRL framework, where we maximize a lower bound of the true expected return, by alternating between dynamic-model training and policy learning. With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets. Source code is publicly released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题