Tube MPC的深度学习管

论文标题

Tube MPC的深度学习管

Deep Learning Tubes for Tube MPC

论文作者

Fan, David D., Agha-mohammadi, Ali-akbar, Theodorou, Evangelos A.

论文摘要

基于学习的控制旨在构建用于计划或轨迹优化的系统模型，例如在基于模型的强化学习中。为了在这种情况下获得安全保证，必须准确量化不确定性。这种不确定性可能来自学习错误（例如由于缺乏数据），或者可能是系统固有的。在学习的动态模型中传播不确定性是一个困难的问题。在这项工作中，我们使用深度学习来获取轨迹分布如何行为的表现力和灵活模型，然后我们将其用于非线性模型预测控制（MPC）。我们引入了一个深度分位回归框架以进行控制，该框架可以实施概率分位数界限并量化认知不确定性。使用我们的方法，我们探索了包含系统可能轨迹的学习管的三种不同方法，并演示了如何在管MPC方案中使用它们。我们证明这些方案是可行的，并满足了所需的概率余量。我们介绍了对非线性四型系统模拟的实验，证明了这些思想的实际功效。

Learning-based control aims to construct models of a system to use for planning or trajectory optimization, e.g. in model-based reinforcement learning. In order to obtain guarantees of safety in this context, uncertainty must be accurately quantified. This uncertainty may come from errors in learning (due to a lack of data, for example), or may be inherent to the system. Propagating uncertainty forward in learned dynamics models is a difficult problem. In this work we use deep learning to obtain expressive and flexible models of how distributions of trajectories behave, which we then use for nonlinear Model Predictive Control (MPC). We introduce a deep quantile regression framework for control that enforces probabilistic quantile bounds and quantifies epistemic uncertainty. Using our method we explore three different approaches for learning tubes that contain the possible trajectories of the system, and demonstrate how to use each of them in a Tube MPC scheme. We prove these schemes are recursively feasible and satisfy constraints with a desired margin of probability. We present experiments in simulation on a nonlinear quadrotor system, demonstrating the practical efficacy of these ideas.

下载PDF全文

下载文献需遵守相关版权规定

论文标题