论文标题

在熵正规化路径积分控制轨迹优化

On Entropy Regularized Path Integral Control for Trajectory Optimization

论文作者

Lefebvre, Tom, Crevecoeur, Guillaume

论文摘要

在本文中,我们介绍了有关路径积分控制(PIC)方法的广义观点。 PIC是指与线性解决的最佳控制(LSOC)的设置紧密相关的特定策略搜索方法,该方法是非线性随机最佳控制(SOC)问题的受限子类。该类是独特的,从某种意义上说,它可以明确解决以产生正式的最佳状态轨迹分布。在这项贡献中,我们首先回顾了图片理论,并讨论了针对政策搜索量身定制的相关算法。我们能够确定依赖最佳状态轨迹分布的存在的通用设计策略,并通过最大程度地降低通过其策略参数的最佳轨迹和状态轨迹分布之间的跨熵来找到参数策略。受到这一观察的启发,我们旨在制定一个与LSOC设置具有特征的SOC问题,但涵盖了不太限制的问题表述类别。我们将这个SOC问题称为熵正则轨迹优化。该问题与熵正规化的随机控制设置密切相关,后者最近经常通过增强学习(RL)社区来解决。我们分析了理论状态轨迹分布序列的理论收敛行为,并使用针对经典优化问题量身定制的随机搜索方法来吸引连接。最后,我们得出了明确的更新,并将隐含的熵正则PIC与PIC和RL的早期工作进行比较,以进行无衍生的轨迹优化。

In this article we present a generalised view on Path Integral Control (PIC) methods. PIC refers to a particular class of policy search methods that are closely tied to the setting of Linearly Solvable Optimal Control (LSOC), a restricted subclass of nonlinear Stochastic Optimal Control (SOC) problems. This class is unique in the sense that it can be solved explicitly to yield a formal optimal state trajectory distribution. In this contribution we first review the PIC theory and discuss related algorithms tailored to policy search in general. We are able to identify a generic design strategy that relies on the existence of an optimal state trajectory distribution and finds a parametric policy by minimizing the cross entropy between the optimal and a state trajectory distribution parametrized through its policy. Inspired by this observation we then aim to formulate a SOC problem that shares traits with the LSOC setting yet that covers a less restrictive class of problem formulations. We refer to this SOC problem as Entropy Regularized Trajectory Optimization. The problem is closely related to the Entropy Regularized Stochastic Optimal Control setting which is lately often addressed by the Reinforcement Learning (RL) community. We analyse the theoretical convergence behaviour of the theoretical state trajectory distribution sequence and draw connections with stochastic search methods tailored to classic optimization problems. Finally we derive explicit updates and compare the implied Entropy Regularized PIC with earlier work in the context of both PIC and RL for derivative-free trajectory optimization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源