设计动态输出反馈控制器的策略梯度方法

论文标题

设计动态输出反馈控制器的策略梯度方法

Policy Gradient Methods for Designing Dynamic Output Feedback Controllers

论文作者

Sadamoto, Tomonori, Hirai, Takumi

论文摘要

本文提出了基于模型和无模型的策略梯度方法（PGM），用于设计动态输出反馈控制器，以部分可观察到的离散时间。为了实现这一目标，我们首先表明，任何动态输出反馈控制器设计都等同于新介绍的系统的状态反馈控制器设计，其内部状态是有限的输入输出历史记录（IOH）。接下来，基于这种等效性，我们提出了一个基于模型的PGM，并通过证明Polyak-lojasiewicz的不平等能够对IOH Dynamics进行基于可及性的无损失预测，以证明其全局线性收敛。此外，我们提出了PGM的两个无模型实现：多剧和单一曲线PGM。前者是基于模型的PGM的蒙特卡洛近似值，而后者是前者的简化版本，以便于在实际系统中易用。还提供了两种方法的样本复杂性分析。最后，通过数值模拟研究了基于模型/模型的PGM的有效性。

This paper proposes model-based and model-free policy gradient methods (PGMs) for designing dynamic output feedback controllers for discrete-time partially observable systems. To fulfill this objective, we first show that any dynamic output feedback controller design is equivalent to a state-feedback controller design for a newly introduced system whose internal state is a finite-length input-output history (IOH). Next, based on this equivalency, we propose a model-based PGM and show its global linear convergence by proving that the Polyak-Lojasiewicz inequality holds for a reachability-based lossless projection of the IOH dynamics. Moreover, we propose two model-free implementations of the PGM: the multi- and single-episodic PGM. The former is a Monte Carlo approximation of the model-based PGM, whereas the latter is a simplified version of the former for ease of use in real systems. A sample complexity analysis of both methods is also presented. Finally, the effectiveness of the model-based/model-free PGMs is investigated through a numerical simulation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题