免费终端时间最佳控制的演员评论家方法

论文标题

免费终端时间最佳控制的演员评论家方法

An Actor Critic Method for Free Terminal Time Optimal Control

论文作者

Burton, Evan, Nakamura-Zimmerer, Tenavi, Gong, Qi, Kang, Wei

论文摘要

自由终端时间的最佳控制问题提出了许多挑战，包括非平滑和不连续的控制定律，不规则的价值功能，许多本地优点以及维度的诅咒。为了克服这些问题，我们提出了通过指数转换从增强学习领域的基于模型的参与者批判性范式的适应，以学习近似的反馈控制和价值函数对。我们证明了该算法在典型示例中的有效性，其中包括此类问题中存在的每个主要病理问题。

Optimal control problems with free terminal time present many challenges including nonsmooth and discontinuous control laws, irregular value functions, many local optima, and the curse of dimensionality. To overcome these issues, we propose an adaptation of the model-based actor-critic paradigm from the field of Reinforcement Learning via an exponential transformation to learn an approximate feedback control and value function pair. We demonstrate the algorithm's effectiveness on prototypical examples featuring each of the main pathological issues present in problems of this type.

下载PDF全文

下载文献需遵守相关版权规定

论文标题