论文标题
基于PDE的控制的多级增强学习框架
A Multilevel Reinforcement Learning Framework for PDE-based Control
论文作者
论文摘要
加强学习(RL)是解决控制问题的有前途的方法。但是,无模型的RL算法是效率低下的样本,如果不是数百万个样本,才能学习最佳的控制策略。 RL中的主要计算成本来源对应于过渡函数,该函数由模型动力学决定。当模型动力学用耦合PDE表示时,这尤其有问题。在这种情况下,过渡函数通常涉及解决上述PDE的大规模离散化。我们提出了一个多级RL框架,以通过利用对应于规模离散化(即多级模型)的超级模型来减轻此成本。这是通过在经典框架中制定策略和 /或价值网络的目标函数而不是蒙特卡洛估计的近似多级蒙特卡洛估计值来完成的。作为此框架的演示,我们提出了近端策略优化(PPO)算法的多级版本。在这里,级别是指基于仿真的环境的网格保真度。我们提供了两个基于模拟的环境的示例,这些环境采用了使用有限体积离散化解决的随机PDE。对于提出的案例研究,我们观察到使用多级PPO与经典同行相比,使用多级PPO进行了大量的计算节省。
Reinforcement learning (RL) is a promising method to solve control problems. However, model-free RL algorithms are sample inefficient and require thousands if not millions of samples to learn optimal control policies. A major source of computational cost in RL corresponds to the transition function, which is dictated by the model dynamics. This is especially problematic when model dynamics is represented with coupled PDEs. In such cases, the transition function often involves solving a large-scale discretization of the said PDEs. We propose a multilevel RL framework in order to ease this cost by exploiting sublevel models that correspond to coarser scale discretization (i.e. multilevel models). This is done by formulating an approximate multilevel Monte Carlo estimate of the objective function of the policy and / or value network instead of Monte Carlo estimates, as done in the classical framework. As a demonstration of this framework, we present a multilevel version of the proximal policy optimization (PPO) algorithm. Here, the level refers to the grid fidelity of the chosen simulation-based environment. We provide two examples of simulation-based environments that employ stochastic PDEs that are solved using finite-volume discretization. For the case studies presented, we observed substantial computational savings using multilevel PPO compared to its classical counterpart.