论文标题

网络随机零和游戏中的无重格学习

No-Regret Learning in Network Stochastic Zero-Sum Games

论文作者

Huang, Shijie, Lei, Jinlong, Hong, Yiguang

论文摘要

No-Regret学习已被广​​泛用于计算两人零和游戏中的NASH平衡。但是,对于网络随机零和游戏,仍然缺乏遗憾的分析,在两个子网络中竞争的玩家只能访问某些本地信息,而成本功能包括不确定性。当一组检查员共同检测一组逃避者时,可以在安全游戏中找到这种游戏模型。在本文中,我们提出了一个分布式随机镜下降(D-SMD)方法,并在预期的意义上建立了遗憾范围$ o(\ sqrt {t})$和$ o(\ log t)$(\ log t)$,分别是凸孔 - concave的含义,并且分别强劲convex-convex-ronglong浓缩成本。我们的界限与最著名的一阶在线优化算法相匹配。然后,我们证明了D-SMD的时间平均迭代的收敛性与NASH均衡的集合。最后,我们表明,在严格凸出的凹面环境中,D-SMD的实际迭代几乎肯定会融合到NASH平衡。

No-regret learning has been widely used to compute a Nash equilibrium in two-person zero-sum games. However, there is still a lack of regret analysis for network stochastic zero-sum games, where players competing in two subnetworks only have access to some local information, and the cost functions include uncertainty. Such a game model can be found in security games, when a group of inspectors work together to detect a group of evaders. In this paper, we propose a distributed stochastic mirror descent (D-SMD) method, and establish the regret bounds $O(\sqrt{T})$ and $O(\log T)$ in the expected sense for convex-concave and strongly convex-strongly concave costs, respectively. Our bounds match those of the best known first-order online optimization algorithms. We then prove the convergence of the time-averaged iterates of D-SMD to the set of Nash equilibria. Finally, we show that the actual iterates of D-SMD almost surely converge to the Nash equilibrium in the strictly convex-strictly concave setting.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源