用于多种类型的固定平均场平衡的固定平均场平衡的免费增强学习算法

论文标题

用于多种类型的固定平均场平衡的固定平均场平衡的免费增强学习算法

Model Free Reinforcement Learning Algorithm for Stationary Mean field Equilibrium for Multiple Types of Agents

论文作者

Ghosh, Arnob, Aggarwal, Vaneet

论文摘要

我们考虑了在无限的地平线上的多代理马尔可夫战略互动，在该视野中，代理可能是多种类型的。当每种类型的代理数量变得无限时，我们将战略互动作为均值限制中的平均场游戏建模。每个代理都有一个私人状态；国家根据不同类型的代理状态的分布和代理的作用而发展。每个代理商都希望在无限视野上最大化折扣价值，这取决于代理的状态以及领导者和追随者状态的分布。我们试图在上述游戏中表征和计算固定的多类型平均平衡（MMFE）。我们表征存在固定MMFE的条件。最后，我们建议使用策略梯度方法基于加强学习（RL）算法，以找到固定的MMFE时，当代理不了解动力学时。从数值上讲，我们评估了这种相互作用如何模拟防御者和对手之间的网络攻击，并显示基于RL的算法如何融合到平衡。

We consider a multi-agent Markov strategic interaction over an infinite horizon where agents can be of multiple types. We model the strategic interaction as a mean-field game in the asymptotic limit when the number of agents of each type becomes infinite. Each agent has a private state; the state evolves depending on the distribution of the state of the agents of different types and the action of the agent. Each agent wants to maximize the discounted sum of rewards over the infinite horizon which depends on the state of the agent and the distribution of the state of the leaders and followers. We seek to characterize and compute a stationary multi-type Mean field equilibrium (MMFE) in the above game. We characterize the conditions under which a stationary MMFE exists. Finally, we propose Reinforcement learning (RL) based algorithm using policy gradient approach to find the stationary MMFE when the agents are unaware of the dynamics. We, numerically, evaluate how such kind of interaction can model the cyber attacks among defenders and adversaries, and show how RL based algorithm can converge to an equilibrium.

下载PDF全文

下载文献需遵守相关版权规定

论文标题