霜冻空心实验：帕夫洛维亚信号传导，作为试剂之间协调和通信的途径

论文标题

霜冻空心实验：帕夫洛维亚信号传导，作为试剂之间协调和通信的途径

The Frost Hollow Experiments: Pavlovian Signalling as a Path to Coordination and Communication Between Agents

论文作者

Pilarski, Patrick M., Butcher, Andrew, Davoodi, Elnaz, Johanson, Michael Bradley, Brenneis, Dylan J. A., Parker, Adam S. R., Acker, Leslie, Botvinick, Matthew M., Modayil, Joseph, White, Adam

论文摘要

在接近任何单一代理商孤立的决策问题时，学会之间的沟通是一种强大的工具。但是，机器代理商或人机合作伙伴关系之间的持续协调和沟通学习仍然是一个具有挑战性的开放问题。作为解决持续交流学习问题的垫脚石，在本文中，我们对我们称为Pavlovian信号传导的多方面研究贡献了一项多方面的研究，该过程通过该过程，通过一个代理商为另一位代理商提供了一位代理商对另一个代理商的决策，具有不同知觉访问其共享环境的过程。我们试图确定不同的时间过程和代表性选择如何影响学习剂之间的帕夫洛维亚信号。为此，我们介绍了一个可观察到的决策域，我们称为霜冻空心。在这个领域中，预测学习者和强化学习代理人耦合到一个由两部分组成的决策系统中，该系统试图在避免时间条件危害的同时获得稀疏的奖励。我们评估了两个领域的变化：1）在线性步行中的机器预测和控制学习，以及2）预测学习机与人类参与者在虚拟现实环境中相互作用。我们的结果展示了Pavlovian信号的学习速度，不同的时间表示（也不）对代理机构协调产生的影响，以及时间混叠如何影响代理人和人类代理人的相互作用。作为主要贡献，我们建立了帕夫洛夫信号作为固定信号范式和完全自适应的通信学习之间的自然桥梁。因此，我们的结果表明，在强化学习者之间持续进行持续的交流学习，在一系列现实世界中具有潜在的影响。

Learned communication between agents is a powerful tool when approaching decision-making problems that are hard to overcome by any single agent in isolation. However, continual coordination and communication learning between machine agents or human-machine partnerships remains a challenging open problem. As a stepping stone toward solving the continual communication learning problem, in this paper we contribute a multi-faceted study into what we term Pavlovian signalling -- a process by which learned, temporally extended predictions made by one agent inform decision-making by another agent with different perceptual access to their shared environment. We seek to establish how different temporal processes and representational choices impact Pavlovian signalling between learning agents. To do so, we introduce a partially observable decision-making domain we call the Frost Hollow. In this domain a prediction learning agent and a reinforcement learning agent are coupled into a two-part decision-making system that seeks to acquire sparse reward while avoiding time-conditional hazards. We evaluate two domain variations: 1) machine prediction and control learning in a linear walk, and 2) a prediction learning machine interacting with a human participant in a virtual reality environment. Our results showcase the speed of learning for Pavlovian signalling, the impact that different temporal representations do (and do not) have on agent-agent coordination, and how temporal aliasing impacts agent-agent and human-agent interactions differently. As a main contribution, we establish Pavlovian signalling as a natural bridge between fixed signalling paradigms and fully adaptive communication learning. Our results therefore point to an actionable, constructivist path towards continual communication learning between reinforcement learning agents, with potential impact in a range of real-world settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题