使用参数化环境的随机最大原理学习方法

论文标题

使用参数化环境的随机最大原理学习方法

A Stochastic Maximum Principle Approach for Reinforcement Learning with Parameterized Environment

论文作者

Archibald, Richard, Bao, Feng, Yong, Jiongmin

论文摘要

在这项工作中，我们引入了一种随机最大原理（SMP）方法，用于解决强化学习问题，假设环境中的未知数可以根据物理知识进行参数化。为了开发数值算法，我们将应用一种有效的在线参数估计方法作为我们的勘探技术来估算培训过程中的环境参数，并且将通过有效的落后行动学习方法来实现最佳策略的利用，以改善SMP框架。将提出数值实验，以证明我们用于增强学习的SMP方法可以产生可靠的控制策略，并且与基于标准的动态编程原理方法相比，SMP求解器中的梯度下降类型优化需要更少的训练发作。

In this work, we introduce a stochastic maximum principle (SMP) approach for solving the reinforcement learning problem with the assumption that the unknowns in the environment can be parameterized based on physics knowledge. For the development of numerical algorithms, we shall apply an effective online parameter estimation method as our exploration technique to estimate the environment parameter during the training procedure, and the exploitation for the optimal policy will be achieved by an efficient backward action learning method for policy improvement under the SMP framework. Numerical experiments will be presented to demonstrate that our SMP approach for reinforcement learning can produce reliable control policy, and the gradient descent type optimization in the SMP solver requires less training episodes compared with the standard dynamic programming principle based methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题