网络随机零和游戏中的无重格学习

论文标题

网络随机零和游戏中的无重格学习

No-Regret Learning in Network Stochastic Zero-Sum Games

论文作者

Huang, Shijie, Lei, Jinlong, Hong, Yiguang

论文摘要

No-Regret学习已被广泛用于计算两人零和游戏中的NASH平衡。但是，对于网络随机零和游戏，仍然缺乏遗憾的分析，在两个子网络中竞争的玩家只能访问某些本地信息，而成本功能包括不确定性。当一组检查员共同检测一组逃避者时，可以在安全游戏中找到这种游戏模型。在本文中，我们提出了一个分布式随机镜下降（D-SMD）方法，并在预期的意义上建立了遗憾范围$ o（\ sqrt {t}）$和$ o（\ log t）$（\ log t）$，分别是凸孔 - concave的含义，并且分别强劲convex-convex-ronglong浓缩成本。我们的界限与最著名的一阶在线优化算法相匹配。然后，我们证明了D-SMD的时间平均迭代的收敛性与NASH均衡的集合。最后，我们表明，在严格凸出的凹面环境中，D-SMD的实际迭代几乎肯定会融合到NASH平衡。

No-regret learning has been widely used to compute a Nash equilibrium in two-person zero-sum games. However, there is still a lack of regret analysis for network stochastic zero-sum games, where players competing in two subnetworks only have access to some local information, and the cost functions include uncertainty. Such a game model can be found in security games, when a group of inspectors work together to detect a group of evaders. In this paper, we propose a distributed stochastic mirror descent (D-SMD) method, and establish the regret bounds $O(\sqrt{T})$ and $O(\log T)$ in the expected sense for convex-concave and strongly convex-strongly concave costs, respectively. Our bounds match those of the best known first-order online optimization algorithms. We then prove the convergence of the time-averaged iterates of D-SMD to the set of Nash equilibria. Finally, we show that the actual iterates of D-SMD almost surely converge to the Nash equilibrium in the strictly convex-strictly concave setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题