具有理论支持样本重用的广义政策改进算法

论文标题

具有理论支持样本重用的广义政策改进算法

Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse

论文作者

Queeney, James, Paschalidis, Ioannis Ch., Cassandras, Christos G.

论文摘要

我们为数据驱动的，基于学习的控制开发了一类新的无模型深钢筋学习算法。我们的广义政策改进算法将政策改进的政策改进保证与样本重用的效率相结合，解决了现实世界中两个重要部署要求之间的权衡：（i）实际绩效保证和（ii）数据效率。我们通过对广泛的模拟控制任务进行广泛的实验分析来证明这种新算法的好处。

We develop a new class of model-free deep reinforcement learning algorithms for data-driven, learning-based control. Our Generalized Policy Improvement algorithms combine the policy improvement guarantees of on-policy methods with the efficiency of sample reuse, addressing a trade-off between two important deployment requirements for real-world control: (i) practical performance guarantees and (ii) data efficiency. We demonstrate the benefits of this new class of algorithms through extensive experimental analysis on a broad range of simulated control tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题