重新访问基于模型的价值扩展

论文标题

重新访问基于模型的价值扩展

Revisiting Model-based Value Expansion

论文作者

Palenicek, Daniel, Lutter, Michael, Peters, Jan

论文摘要

基于模型的价值扩展方法有望提高价值功能目标的质量，从而提高价值功能学习的有效性。但是，迄今为止，这些方法的表现优于Dyna式算法，其在概念上具有更简单的1步值功能目标。这表明在实践中，价值扩展的理论理由似乎并不成立。我们提供了一项彻底的经验研究，以阐明实践中价值扩展方法失败的原因，这被认为是复合模型误差。通过利用基于GPU的物理模拟器，我们能够有效地使用真正的动力学来进行基于模型的增强学习环内的分析。进行真实动力和学习动力之间的大量比较将光线放入了这个黑匣子中。本文可以更好地了解价值扩展中的实际问题。我们通过经验测试当前方法的最大理论性能来提供未来的研究方向。

Model-based value expansion methods promise to improve the quality of value function targets and, thereby, the effectiveness of value function learning. However, to date, these methods are being outperformed by Dyna-style algorithms with conceptually simpler 1-step value function targets. This shows that in practice, the theoretical justification of value expansion does not seem to hold. We provide a thorough empirical study to shed light on the causes of failure of value expansion methods in practice which is believed to be the compounding model error. By leveraging GPU based physics simulators, we are able to efficiently use the true dynamics for analysis inside the model-based reinforcement learning loop. Performing extensive comparisons between true and learned dynamics sheds light into this black box. This paper provides a better understanding of the actual problems in value expansion. We provide future directions of research by empirically testing the maximum theoretical performance of current approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题