使用动作功能学习直观政策

论文标题

使用动作功能学习直观政策

Learning Intuitive Policies Using Action Features

论文作者

Ma, Mingwei, Liu, Jizhou, Sokota, Samuel, Kleiman-Weiner, Max, Foerster, Jakob

论文摘要

多机构协调中的一个尚未解决的挑战是使AI代理能够利用动作特征与观察特征之间的语义关系。人类以高度直观的方式利用这些关系。例如，在没有共享语言的情况下，我们可能会指出我们想要的对象或举起手指，以指示我们想要多少对象。为了应对这一挑战，我们研究了网络体系结构对学习算法倾向的影响，以利用这些语义关系。在程序生成的协调任务中，我们发现共同处理观测和行动的特征表示的基于注意力的架构对学习直觉策略具有更好的归纳偏见。通过细粒度的评估和场景分析，我们表明由此产生的策略是可解释的。此外，此类代理人在没有任何人为数据的情况下与人协调。

An unaddressed challenge in multi-agent coordination is to enable AI agents to exploit the semantic relationships between the features of actions and the features of observations. Humans take advantage of these relationships in highly intuitive ways. For instance, in the absence of a shared language, we might point to the object we desire or hold up our fingers to indicate how many objects we want. To address this challenge, we investigate the effect of network architecture on the propensity of learning algorithms to exploit these semantic relationships. Across a procedurally generated coordination task, we find that attention-based architectures that jointly process a featurized representation of observations and actions have a better inductive bias for learning intuitive policies. Through fine-grained evaluation and scenario analysis, we show that the resulting policies are human-interpretable. Moreover, such agents coordinate with people without training on any human data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题