通过乐观的平衡计算，有效的基于模型的多代理增强学习

论文标题

通过乐观的平衡计算，有效的基于模型的多代理增强学习

Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation

论文作者

Sessa, Pier Giuseppe, Kamgarpour, Maryam, Krause, Andreas

论文摘要

我们考虑基于模型的多代理增强学习，其中环境过渡模型未知，只能通过与环境的昂贵互动来学习。我们提出了H-MARL（幻觉多代理增强学习），这是一种新型的样本算法，可以有效地平衡探索，即学习环境和剥削，即在潜在的一般及其Markov游戏中实现良好的平衡性能。 H-MARL围绕未知过渡模型建立高概率的置信区间，并根据新观察到的数据顺序更新它们。使用这些，它为每轮计算平衡政策的代理商构建了一个乐观的幻觉游戏。我们考虑一般的统计模型（例如高斯流程，深层合奏等）和政策类别（例如，深神经网络），理论上通过限制了代理人的动态遗憾来分析我们的方法。此外，我们为基础马尔可夫游戏的平衡提供了融合率。我们在自主驾驶模拟基准的实验中证明了我们的方法。 H-MARL学习与环境进行了几次相互作用后，成功的平衡策略，与非当时的探索方法相比，可以显着提高性能。

We consider model-based multi-agent reinforcement learning, where the environment transition model is unknown and can only be learned via expensive interactions with the environment. We propose H-MARL (Hallucinated Multi-Agent Reinforcement Learning), a novel sample-efficient algorithm that can efficiently balance exploration, i.e., learning about the environment, and exploitation, i.e., achieve good equilibrium performance in the underlying general-sum Markov game. H-MARL builds high-probability confidence intervals around the unknown transition model and sequentially updates them based on newly observed data. Using these, it constructs an optimistic hallucinated game for the agents for which equilibrium policies are computed at each round. We consider general statistical models (e.g., Gaussian processes, deep ensembles, etc.) and policy classes (e.g., deep neural networks), and theoretically analyze our approach by bounding the agents' dynamic regret. Moreover, we provide a convergence rate to the equilibria of the underlying Markov game. We demonstrate our approach experimentally on an autonomous driving simulation benchmark. H-MARL learns successful equilibrium policies after a few interactions with the environment and can significantly improve the performance compared to non-optimistic exploration methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题