最初的价值问题增强了与深神经网络的闭环最佳控制设计的采样

论文标题

最初的价值问题增强了与深神经网络的闭环最佳控制设计的采样

Initial Value Problem Enhanced Sampling for Closed-Loop Optimal Control Design with Deep Neural Networks

论文作者

Zhang, Xuanxi, Long, Jihao, Hu, Wei, E, Weinan, Han, Jiequn

论文摘要

高维非线性系统的闭环最佳控制设计一直是一个长期的挑战。传统方法，例如解决相关的汉密尔顿 - 雅各比 - 贝尔曼方程，都遭受了维度的诅咒。最近的文献提出了一种基于监督学习的新的有前途的方法，它利用强大的开环最佳控制求解器生成训练数据和神经网络作为有效的高维功能近似器，以适合闭环最佳控制。这种方法成功地处理了某些高维最佳控制问题，但在更具挑战性的问题上仍然表现不佳。失败的关键原因之一是受控动力学带来的所谓分布不匹配现象。在本文中，我们研究了这种现象，并提出了初始值问题增强了采样方法来减轻此问题。从理论上讲，我们证明，这种抽样策略通过与总持续时间成正比的因素改善了经典线性季节调节器的香草策略。我们进一步证明，所提出的采样策略可显着改善测试控制问题的性能，包括四型二次手机的最佳着陆问题和7 DOF操纵器的最佳到达问题。

Closed-loop optimal control design for high-dimensional nonlinear systems has been a long-standing challenge. Traditional methods, such as solving the associated Hamilton-Jacobi-Bellman equation, suffer from the curse of dimensionality. Recent literature proposed a new promising approach based on supervised learning, by leveraging powerful open-loop optimal control solvers to generate training data and neural networks as efficient high-dimensional function approximators to fit the closed-loop optimal control. This approach successfully handles certain high-dimensional optimal control problems but still performs poorly on more challenging problems. One of the crucial reasons for the failure is the so-called distribution mismatch phenomenon brought by the controlled dynamics. In this paper, we investigate this phenomenon and propose the initial value problem enhanced sampling method to mitigate this problem. We theoretically prove that this sampling strategy improves over the vanilla strategy on the classical linear-quadratic regulator by a factor proportional to the total time duration. We further numerically demonstrate that the proposed sampling strategy significantly improves the performance on tested control problems, including the optimal landing problem of a quadrotor and the optimal reaching problem of a 7 DoF manipulator.

下载PDF全文

下载文献需遵守相关版权规定

论文标题