模型预测控制的时间差异学习

论文标题

模型预测控制的时间差异学习

Temporal Difference Learning for Model Predictive Control

论文作者

Hansen, Nicklas, Wang, Xiaolong, Su, Hao

论文摘要

数据驱动的模型预测控制比无模型方法具有两个关键的优势：通过模型学习提高样本效率的潜力，并且作为计划增加的计算预算的表现更好。但是，在漫长的视野上进行计划既昂贵又挑战，以获得准确的环境模型。在这项工作中，我们结合了无模型和基于模型的方法的优势。我们在短范围内使用学习的面向任务的潜在动力学模型进行局部轨迹优化，并使用学习的终端值函数来估计长期回报，这两者都是通过时间差异学习共同学习的。我们的方法TD-MPC在DMCONTROL和META-WORLD的状态和基于图像的连续控制任务上都超过了先前工作的样本效率和渐近性能。代码和视频结果可在https://nicklashansen.github.io/td-mpc上找到。

Data-driven model predictive control has two key advantages over model-free methods: a potential for improved sample efficiency through model learning, and better performance as computational budget for planning increases. However, it is both costly to plan over long horizons and challenging to obtain an accurate model of the environment. In this work, we combine the strengths of model-free and model-based methods. We use a learned task-oriented latent dynamics model for local trajectory optimization over a short horizon, and use a learned terminal value function to estimate long-term return, both of which are learned jointly by temporal difference learning. Our method, TD-MPC, achieves superior sample efficiency and asymptotic performance over prior work on both state and image-based continuous control tasks from DMControl and Meta-World. Code and video results are available at https://nicklashansen.github.io/td-mpc.

下载PDF全文

下载文献需遵守相关版权规定

论文标题