论文标题
在零和游戏的最后一期收敛范围内
On Last-Iterate Convergence Beyond Zero-Sum Games
论文作者
论文摘要
关于学习动力学的\ emph {最后介质融合}的大多数现有结果仅限于两人零和游戏,并且仅适用于对玩家所关注的动态的严格假设。在本文中,我们提供了适用于更广泛的游戏和学习动态家庭的新结果和技术。首先,我们使用基于遗憾的分析来表明,在包括恒定和策略性零和零和游戏的一类游戏中,诸如\ emph {乐观的镜像下降(OMD)}之类的动态具有\ emph {有限的二阶路径长度},即使在玩家使用不同的Algorithm和预测机构时也具有属性。这使我们能够获得$ o(1/\ sqrt {t})$费率和最佳$ o(1)$后悔界。我们的分析还揭示了一个令人惊讶的属性:OMD任意地达到NASH均衡,或者它在效率方面的表现优于\ emph {无政府状态的强劲价格。此外,对于潜在的游戏,我们在$ O(1/ε^2)$ tererations $ e(1/ε^2)$迭代后,在广泛的正规机构下进行镜像下降,以及OMD变体的最佳$ o(1)$ o(1)$ o(1)。我们的框架还扩展到了接近潜力的游戏,并在Fisher的市场模型中统一了已知的分布式学习分析。最后,我们分析了\ emph {乐观的梯度下降(OGD)}的融合,效率和鲁棒性。
Most existing results about \emph{last-iterate convergence} of learning dynamics are limited to two-player zero-sum games, and only apply under rigid assumptions about what dynamics the players follow. In this paper we provide new results and techniques that apply to broader families of games and learning dynamics. First, we use a regret-based analysis to show that in a class of games that includes constant-sum polymatrix and strategically zero-sum games, dynamics such as \emph{optimistic mirror descent (OMD)} have \emph{bounded second-order path lengths}, a property which holds even when players employ different algorithms and prediction mechanisms. This enables us to obtain $O(1/\sqrt{T})$ rates and optimal $O(1)$ regret bounds. Our analysis also reveals a surprising property: OMD either reaches arbitrarily close to a Nash equilibrium, or it outperforms the \emph{robust price of anarchy} in efficiency. Moreover, for potential games we establish convergence to an $ε$-equilibrium after $O(1/ε^2)$ iterations for mirror descent under a broad class of regularizers, as well as optimal $O(1)$ regret bounds for OMD variants. Our framework also extends to near-potential games, and unifies known analyses for distributed learning in Fisher's market model. Finally, we analyze the convergence, efficiency, and robustness of \emph{optimistic gradient descent (OGD)} in general-sum continuous games.