统一的增强学习方法，数量响应平衡和两人零和游戏

论文标题

统一的增强学习方法，数量响应平衡和两人零和游戏

A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

论文作者

Sokota, Samuel, D'Orazio, Ryan, Kolter, J. Zico, Loizou, Nicolas, Lanctot, Marc, Mitliagkas, Ioannis, Brown, Noam, Kroer, Christian

论文摘要

这项工作研究了一种算法，我们称之为磁性镜下降，该算法是受镜下降和非欧几里得近端梯度算法的启发的。我们的贡献是证明了磁性镜下降的优点，既是平衡求解器，又是在两人零和游戏中的加强学习方法。这些优点包括：1）成为第一个具有一阶反馈的广泛形式游戏的线性收敛的均衡求解器； 2）是在表格设置中使用CFR实现经验竞争结果的第一个标准增强学习算法； 3）在3x3黑暗六角形和幻影tic-tac-toe中取得了良好的表现，作为一种自我扮演的深钢筋学习算法。

This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games. These virtues include: 1) Being the first quantal response equilibria solver to achieve linear convergence for extensive-form games with first order feedback; 2) Being the first standard reinforcement learning algorithm to achieve empirically competitive results with CFR in tabular settings; 3) Achieving favorable performance in 3x3 Dark Hex and Phantom Tic-Tac-Toe as a self-play deep reinforcement learning algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题