论文标题
比较用于离线增强学习的无模型和基于模型的算法
Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
论文作者
论文摘要
离线增强学习(RL)算法通常是在诸如Mujoco之类的环境中设计的,在这种环境中,计划范围非常长,没有噪音。我们比较了各种工业基准(IB)数据集上的无模型,基于模型的RL方法,以测试更接近现实世界中问题的设置中的算法,包括复杂的噪声和部分可观察的状态。我们发现,在IB上,混合方法面临严重的困难,并且更简单的算法,例如基于推出的算法或使用更简单的正规机构的无模型算法在数据集中表现最佳。
Offline reinforcement learning (RL) Algorithms are often designed with environments such as MuJoCo in mind, in which the planning horizon is extremely long and no noise exists. We compare model-free, model-based, as well as hybrid offline RL approaches on various industrial benchmark (IB) datasets to test the algorithms in settings closer to real world problems, including complex noise and partially observable states. We find that on the IB, hybrid approaches face severe difficulties and that simpler algorithms, such as rollout based algorithms or model-free algorithms with simpler regularizers perform best on the datasets.