游戏中学习的统一随机近似框架

论文标题

游戏中学习的统一随机近似框架

A unified stochastic approximation framework for learning in games

论文作者

Mertikopoulos, Panayotis, Hsieh, Ya-Ping, Cevher, Volkan

论文摘要

我们开发了一个灵活的随机近似框架，用于分析游戏中学习的长期行为（包括连续和有限）。 The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, the exponential/multiplicative weights algorithm for learning in finite games, optimistic and bandit variants of the above, etc. In addition to providing an integrated view of these algorithms, our framework further allows us to obtain several new convergence results, both asymptotic and in finite time, in both continuous and finite games.具体而言，我们提供了一系列标准，用于识别NASH平衡的类别以及具有很高概率吸引的动作概况集，并且我们还介绍了连贯性的概念，这是一种游戏理论属性，其中包括严格而敏锐的平衡，并导致在有限的时间内导致逆转。重要的是，我们的分析适用于基于Oracle的和强盗，基于回报的方法 - 也就是说，当玩家仅观察他们已实现的收益时。

We develop a flexible stochastic approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite). The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, the exponential/multiplicative weights algorithm for learning in finite games, optimistic and bandit variants of the above, etc. In addition to providing an integrated view of these algorithms, our framework further allows us to obtain several new convergence results, both asymptotic and in finite time, in both continuous and finite games. Specifically, we provide a range of criteria for identifying classes of Nash equilibria and sets of action profiles that are attracting with high probability, and we also introduce the notion of coherence, a game-theoretic property that includes strict and sharp equilibria, and which leads to convergence in finite time. Importantly, our analysis applies to both oracle-based and bandit, payoff-based methods - that is, when players only observe their realized payoffs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题