论文标题

在Black-Box广泛的游戏中查找和认证(近)最佳策略

Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games

论文作者

Zhang, Brian Hu, Sandholm, Tuomas

论文摘要

通常 - 例如,在战争游戏,策略视频游戏和财务模拟中,该游戏仅作为黑色框模拟器提供给我们。在这些设置中,由于游戏可能具有未知的自然动作分布(我们只能从中获得样本)和/或太大而无法完全扩展,因此很难通过可利用性来确保计算策略。最近的工作\ cite {zhang20:small}产生了广泛形式游戏的证书概念,可以保证可剥削性,同时不扩展完整的游戏树。但是,这项工作认为黑匣子可以随时采样或扩展游戏树的任意节点,并且可以进行一系列确切的游戏求解(例如,可以进行线性编程)来计算证书。这两个假设都严重限制了该方法的实际适用性。在这项工作中,我们放松了这两个假设。我们表明,可以使用黑匣子获得高概率证书,该黑匣子只能在游戏中玩,只能将遗憾的最小化器作为子例程。作为奖励,我们在广泛形式的游戏设置中获得了带有$ \ tilde o(1/\ sqrt {t})$收敛速率的均衡访问算法,该算法不依赖于具有较低型的接触率概率的采样策略(MCCFR假定)。我们在实验上证明,在黑框设置中,我们的方法能够提供非平凡的可利用性保证,同时仅扩展了一小部分游戏树。

Often -- for example in war games, strategy video games, and financial simulations -- the game is given to us only as a black-box simulator in which we can play it. In these settings, since the game may have unknown nature action distributions (from which we can only obtain samples) and/or be too large to expand fully, it can be difficult to compute strategies with guarantees on exploitability. Recent work \cite{Zhang20:Small} resulted in a notion of certificate for extensive-form games that allows exploitability guarantees while not expanding the full game tree. However, that work assumed that the black box could sample or expand arbitrary nodes of the game tree at any time, and that a series of exact game solves (via, for example, linear programming) can be conducted to compute the certificate. Each of those two assumptions severely restricts the practical applicability of that method. In this work, we relax both of the assumptions. We show that high-probability certificates can be obtained with a black box that can do nothing more than play through games, using only a regret minimizer as a subroutine. As a bonus, we obtain an equilibrium-finding algorithm with $\tilde O(1/\sqrt{T})$ convergence rate in the extensive-form game setting that does not rely on a sampling strategy with lower-bounded reach probabilities (which MCCFR assumes). We demonstrate experimentally that, in the black-box setting, our methods are able to provide nontrivial exploitability guarantees while expanding only a small fraction of the game tree.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源