通过神经网络进行蓬松金的最佳控制

论文标题

通过神经网络进行蓬松金的最佳控制

Pontryagin Optimal Control via Neural Networks

论文作者

Gu, Chengyang, Xiong, Hui, Chen, Yize

论文摘要

解决现实世界中的最佳控制问题是具有挑战性的任务，因为复杂的高维系统动态通常不会透露给决策者。因此，很难以数值方式找到最佳控制动作。为了应对这种建模和计算挑战，在本文中，我们将神经网络与Pontryagin的最大原理（PMP）集成在一起，并提出了一个有效的样本框架NN-PMP级别。可以为具有未知和复杂动态的系统实现所得控制器。通过采用迭代方法，提出的框架不仅利用由神经网络参数化的精确替代模型，还可以通过PMP条件有效地恢复最佳条件以及最佳动作序列。线性二次调节器，网格连接损耗电池的能量套利，单个摆的控制以及两个Mujoco运动任务的数值模拟证明了我们提出的NN-PMP级别是一种通用且多功能的计算工具，可用于查找最佳解决方案。并与广泛应用的基于模型和基于模型的增强算法（RL）算法相比，我们的NN-PMP梯度在控制目标方面实现了更高的样本效率和性能。

Solving real-world optimal control problems are challenging tasks, as the complex, high-dimensional system dynamics are usually unrevealed to the decision maker. It is thus hard to find the optimal control actions numerically. To deal with such modeling and computation challenges, in this paper, we integrate Neural Networks with the Pontryagin's Maximum Principle (PMP), and propose a sample efficient framework NN-PMP-Gradient. The resulting controller can be implemented for systems with unknown and complex dynamics. By taking an iterative approach, the proposed framework not only utilizes the accurate surrogate models parameterized by neural networks, it also efficiently recovers the optimality conditions along with the optimal action sequences via PMP conditions. Numerical simulations on Linear Quadratic Regulator, energy arbitrage of grid-connected lossy battery, control of single pendulum, and two MuJoCo locomotion tasks demonstrate our proposed NN-PMP-Gradient is a general and versatile computation tool for finding optimal solutions. And compared with the widely applied model-free and model-based reinforcement learning (RL) algorithms, our NN-PMP-Gradient achieves higher sample-efficiency and performance in terms of control objectives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题