通过时间和空间进行反向传播：通过多代理增强学习学习数值方法

论文标题

通过时间和空间进行反向传播：通过多代理增强学习学习数值方法

Backpropagation through Time and Space: Learning Numerical Methods with Multi-Agent Reinforcement Learning

论文作者

Way, Elliot, Kapilavai, Dheeraj S. K., Fu, Yiwei, Yu, Lei

论文摘要

我们在时间和空间（BPTTS）中引入了反向传播，这是一种训练复发时空神经网络的方法，该方法用于均质的多代理增强学习（MARL）设置，以学习超轻保护定律的数值方法。我们将偏微分方程（PDE）的数值方案视为强化学习（RL）中的部分可观察到的马尔可夫游戏（POMG）。与数值求解器类似，我们的代理在计算空间的每个离散位置起作用，以实现有效且可推广的学习。要通过对当地状态作用来学习高阶空间方法，该代理必须辨别其在给定时空位置的行动如何影响国家的未来演变。 BPTTS解决了这种非平稳性的表现，这允许在空间和时间上流动梯度。学到的数值策略与在两个设置（汉堡方程式和欧拉方程）中的SOTA数字相媲美，并且可以很好地推广到其他模拟设置。

We introduce Backpropagation Through Time and Space (BPTTS), a method for training a recurrent spatio-temporal neural network, that is used in a homogeneous multi-agent reinforcement learning (MARL) setting to learn numerical methods for hyperbolic conservation laws. We treat the numerical schemes underlying partial differential equations (PDEs) as a Partially Observable Markov Game (POMG) in Reinforcement Learning (RL). Similar to numerical solvers, our agent acts at each discrete location of a computational space for efficient and generalizable learning. To learn higher-order spatial methods by acting on local states, the agent must discern how its actions at a given spatiotemporal location affect the future evolution of the state. The manifestation of this non-stationarity is addressed by BPTTS, which allows for the flow of gradients across both space and time. The learned numerical policies are comparable to the SOTA numerics in two settings, the Burgers' Equation and the Euler Equations, and generalize well to other simulation set-ups.

下载PDF全文

下载文献需遵守相关版权规定

论文标题