连续时间线性二次控制的基于策略梯度的算法

论文标题

连续时间线性二次控制的基于策略梯度的算法

Policy Gradient-based Algorithms for Continuous-time Linear Quadratic Control

论文作者

Bu, Jingjing, Mesbahi, Afshin, Mesbahi, Mehran

论文摘要

我们考虑在反馈收益集上优化实价矩阵函数的连续时间线性季度调节器（LQR）问题。所产生的结果与Bu等人中的结果平行。 [1]用于离散时间LTI系统。在这个方向上，我们表征了几种分析特性（平滑度，强制性，二次生长），这些特性对于分析基于梯度的算法至关重要。我们还指出了连续时间设置的相似性和独特的特征与其离散时间类似物相比。首先，我们检查了三种类型的良好的流量，直接的LQR策略更新：梯度流，自然梯度流和准Newton流。相应成本函数的强制性特性表明，这些流量允许独特的解决方案，而梯度主导的属性则表明Lyapunov底座功能以指数率的速率衰减；另一方面，二次增长可以保证这些流的轨迹在Lyapunov的意义上是指数稳定的。然后，我们讨论这些流量的前向Euler离散化，被认为是梯度下降，自然梯度下降和准Newton的迭代。我们介绍了梯度下降和自然梯度下降的步骤标准，确保两种算法都线性收敛到全局最佳OPTA。还提出了针对准Newton迭代的最佳步骤，并保证了$ Q $ Quadratic的收敛速度 - 在此期间 - 恢复了Kleinman-Newton的迭代。最后，我们使用稀疏模式检查了LQR状态反馈合成。在这种情况下，我们为预计的梯度下降开发了必要的形式主义和见解，从而使我们能够保证互惠率与一阶固定点的收敛速度。

We consider the continuous-time Linear-Quadratic-Regulator (LQR) problem in terms of optimizing a real-valued matrix function over the set of feedback gains. The results developed are in parallel to those in Bu et al. [1] for discrete-time LTI systems. In this direction, we characterize several analytical properties (smoothness, coerciveness, quadratic growth) that are crucial in the analysis of gradient-based algorithms. We also point out similarities and distinctive features of the continuous time setup in comparison with its discrete time analogue. First, we examine three types of well-posed flows direct policy update for LQR: gradient flow, natural gradient flow and the quasi-Newton flow. The coercive property of the corresponding cost function suggests that these flows admit unique solutions while the gradient dominated property indicates that the underling Lyapunov functionals decay at an exponential rate; quadratic growth on the other hand guarantees that the trajectories of these flows are exponentially stable in the sense of Lyapunov. We then discuss the forward Euler discretization of these flows, realized as gradient descent, natural gradient descent and quasi-Newton iteration. We present stepsize criteria for gradient descent and natural gradient descent, guaranteeing that both algorithms converge linearly to the global optima. An optimal stepsize for the quasi-Newton iteration is also proposed, guaranteeing a $Q$-quadratic convergence rate--and in the meantime--recovering the Kleinman-Newton iteration. Lastly, we examine LQR state feedback synthesis with a sparsity pattern. In this case, we develop the necessary formalism and insights for projected gradient descent, allowing us to guarantee a sublinear rate of convergence to a first-order stationary point.

下载PDF全文

下载文献需遵守相关版权规定

论文标题