论文标题
部分可观测时空混沌系统的无模型预测
Bayesian Q-learning With Imperfect Expert Demonstrations
论文作者
论文摘要
带有专家演示的指导探索提高了增强学习的数据效率,但是当前的算法通常过度使用专家信息。我们提出了一种新颖的算法,以借助有限的不完美专家演示来加快Q学习。该算法通过放松最佳的专家假设并逐渐减少非信息专家数据的使用来避免过度依赖专家数据。在实验上,我们在稀疏回报的连锁环境中评估了我们的方法,并以延迟的奖励评估了六个复杂的Atari游戏。通过提出的方法,在大多数环境中,我们可以从演示中获得比示范的深度Q学习更好的结果。
Guided exploration with expert demonstrations improves data efficiency for reinforcement learning, but current algorithms often overuse expert information. We propose a novel algorithm to speed up Q-learning with the help of a limited amount of imperfect expert demonstrations. The algorithm avoids excessive reliance on expert data by relaxing the optimal expert assumption and gradually reducing the usage of uninformative expert data. Experimentally, we evaluate our approach on a sparse-reward chain environment and six more complicated Atari games with delayed rewards. With the proposed methods, we can achieve better results than Deep Q-learning from Demonstrations (Hester et al., 2017) in most environments.