论文标题
随机遗憾的最小化在广泛的游戏中
Stochastic Regret Minimization in Extensive-Form Games
论文作者
论文摘要
蒙特 - 卡洛反事实遗憾最小化(MCCFR)是用于求解顺序游戏的最新算法,对于全树遍历而言太大。它通过使用可以通过抽样计算的梯度估计来起作用。但是,顺序游戏的随机方法尚未在MCCFR之外进行广泛研究。在本文中,我们开发了一个新的框架来开发随机遗憾的最小化方法。该框架使我们能够使用任何遗憾最小化算法以及任何梯度估计器。可以将MCCFR算法作为我们框架的特殊情况进行分析,并且该分析导致有关收敛性的显着理论,同时产生了简化的证明。我们的框架使我们能够实例化几种新的随机方法来解决顺序游戏。我们在三场比赛中展示了广泛的实验,我们的方法的一些变体都优于MCCFR。
Monte-Carlo counterfactual regret minimization (MCCFR) is the state-of-the-art algorithm for solving sequential games that are too large for full tree traversals. It works by using gradient estimates that can be computed via sampling. However, stochastic methods for sequential games have not been investigated extensively beyond MCCFR. In this paper we develop a new framework for developing stochastic regret minimization methods. This framework allows us to use any regret-minimization algorithm, coupled with any gradient estimator. The MCCFR algorithm can be analyzed as a special case of our framework, and this analysis leads to significantly-stronger theoretical on convergence, while simultaneously yielding a simplified proof. Our framework allows us to instantiate several new stochastic methods for solving sequential games. We show extensive experiments on three games, where some variants of our methods outperform MCCFR.