卑鄙的场地游戏有限很多：独立学习和主观性

论文标题

卑鄙的场地游戏有限很多：独立学习和主观性

Mean-Field Games With Finitely Many Players: Independent Learning and Subjectivity

论文作者

Yongacoglu, Bora, Arslan, Gürdal, Yüksel, Serdar

论文摘要

独立的学习者是在多代理系统中采用单药算法的代理人，故意忽略了其他战略代理的效果。本文从分散的学习角度研究了平均游戏，并具有两个主要目标：（i）确定可以指导算法设计的结构，以及（ii）了解独立学习者系统中的新兴行为。我们研究了一个新的观察到的平均场景游戏的新模型，其中有很多玩家，本地行动可观察性以及对全球状态部分观察的一般观察渠道。所考虑的特定观察通道包括（a）全局可观察性，（b）局部和平均场观察性，（c）局部和压缩的平均场观察性，以及（d）仅局部可观察性。我们建立了给定代理的控制问题等于完全观察到的MDP的条件，以及控制问题仅等于POMDP的条件。在与MDPS的连接的基础上，我们证明了在平均场观察性下无内存的固定策略之间存在完美的平衡。利用与POMDP的连接，我们证明了在上述任何观测通道下由独立学习剂获得的学习迭代的收敛性。我们将限制值解释为主观值函数，代理商认为这与其控制问题相关。然后，这些主观价值函数被用来提出主观Q平衡，这是一种针对部分观察到的N玩家平均场游戏的新解决方案概念，该游戏的存在被证明是在平均场或全球可观察性下的。我们为部分观察到的N-player均值游戏提供了一种分散的学习算法，我们证明它通过调整最近开发的满足途径的理论以允许主观性来驱动主观Q平衡。

Independent learners are agents that employ single-agent algorithms in multi-agent systems, intentionally ignoring the effect of other strategic agents. This paper studies mean-field games from a decentralized learning perspective, with two primary objectives: (i) to identify structure that can guide algorithm design, and (ii) to understand the emergent behaviour in systems of independent learners. We study a new model of partially observed mean-field games with finitely many players, local action observability, and a general observation channel for partial observations of the global state. Specific observation channels considered include (a) global observability, (b) local and mean-field observability, (c) local and compressed mean-field observability, and (d) only local observability. We establish conditions under which the control problem of a given agent is equivalent to a fully observed MDP, as well as conditions under which the control problem is equivalent only to a POMDP. Building on the connection to MDPs, we prove the existence of perfect equilibrium among memoryless stationary policies under mean-field observability. Leveraging the connection to POMDPs, we prove convergence of learning iterates obtained by independent learning agents under any of the aforementioned observation channels. We interpret the limiting values as subjective value functions, which an agent believes to be relevant to its control problem. These subjective value functions are then used to propose subjective Q-equilibrium, a new solution concept for partially observed n-player mean-field games, whose existence is proved under mean-field or global observability. We provide a decentralized learning algorithm for partially observed n-player mean-field games, and we show that it drives play to subjective Q-equilibrium by adapting the recently developed theory of satisficing paths to allow for subjectivity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题