深入加强学习的多发讨价还价

论文标题

深入加强学习的多发讨价还价

Multi-Issue Bargaining With Deep Reinforcement Learning

论文作者

Chang, Ho-Chun Herbert

论文摘要

谈判是一个旨在通过争议进行努力并最大化其盈余的过程。由于无法探索在议价游戏中使用深度加强学习，因此本文评估了其利用，适应和合作以产生公平成果的能力。对基于时间的代理，基于行为的代理以及通过自我播放，对两个参与者的批评网络进行了培训。与这些代理商的游戏玩法揭示了三个关键发现。 1）神经药物学会利用基于时间的代理，在决策偏好值中实现明确的过渡。由于其峰值中心和较重的尾巴，Cauchy的分布非常适合采样。用于连续控制的概率分布的峰度和方差敏感性在探索和剥削中产生了权衡。 2）神经药物表现出针对特许权，折现因素和基于行为的策略的不同组合的适应性行为。 3）最重要的是，神经药物学会与其他基于行为的代理合作，在某些情况下，利用不可限制的威胁来迫使更公平的结果。这与进化动力学中的基于声誉的策略相似，并且在古典游戏理论中脱离了平衡。

Negotiation is a process where agents aim to work through disputes and maximize their surplus. As the use of deep reinforcement learning in bargaining games is unexplored, this paper evaluates its ability to exploit, adapt, and cooperate to produce fair outcomes. Two actor-critic networks were trained for the bidding and acceptance strategy, against time-based agents, behavior-based agents, and through self-play. Gameplay against these agents reveals three key findings. 1) Neural agents learn to exploit time-based agents, achieving clear transitions in decision preference values. The Cauchy distribution emerges as suitable for sampling offers, due to its peaky center and heavy tails. The kurtosis and variance sensitivity of the probability distributions used for continuous control produce trade-offs in exploration and exploitation. 2) Neural agents demonstrate adaptive behavior against different combinations of concession, discount factors, and behavior-based strategies. 3) Most importantly, neural agents learn to cooperate with other behavior-based agents, in certain cases utilizing non-credible threats to force fairer results. This bears similarities with reputation-based strategies in the evolutionary dynamics, and departs from equilibria in classical game theory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题