使用拉格朗日放松的时间逻辑约束下的深度加固学习

论文标题

使用拉格朗日放松的时间逻辑约束下的深度加固学习

Deep reinforcement learning under signal temporal logic constraints using Lagrangian relaxation

论文作者

Ikemoto, Junya, Ushio, Toshimitsu

论文摘要

深度强化学习（DRL）吸引了很多关注，作为解决最佳控制问题的方法，而没有系统的数学模型。另一方面，通常，可能对最佳控制问题施加约束。在这项研究中，我们将最佳的控制问题带有限制因素来完成时间控制任务。我们使用信号时间逻辑（STL）来描述约束，这对于时间敏感控制任务很有用，因为它可以在有限的时间间隔内指定连续信号。为了处理STL约束，我们引入了扩展的约束马尔可夫决策过程（CMDP），该过程称为$τ$ -CMDP。我们将STL约束的最佳控制问题作为$τ$ -CMDP提出，并使用Lagrangian松弛方法提出了两阶段约束的DRL算法。通过模拟，我们还证明了所提出算法的学习性能。

Deep reinforcement learning (DRL) has attracted much attention as an approach to solve optimal control problems without mathematical models of systems. On the other hand, in general, constraints may be imposed on optimal control problems. In this study, we consider the optimal control problems with constraints to complete temporal control tasks. We describe the constraints using signal temporal logic (STL), which is useful for time sensitive control tasks since it can specify continuous signals within bounded time intervals. To deal with the STL constraints, we introduce an extended constrained Markov decision process (CMDP), which is called a $τ$-CMDP. We formulate the STL-constrained optimal control problem as the $τ$-CMDP and propose a two-phase constrained DRL algorithm using the Lagrangian relaxation method. Through simulations, we also demonstrate the learning performance of the proposed algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题