论文标题
ZIPFIAN的增强学习环境
Zipfian environments for Reinforcement Learning
论文作者
论文摘要
正如人类和动物在自然世界中学习的那样,它们会遇到远非统一的实体,情况和事件的分布。通常,经常遇到相对较小的经历,而许多重要的体验很少发生。现实的高度紧密,重尾的本质构成了人类和动物通过不断发展的专业记忆系统所面临的特殊学习挑战。相比之下,大多数流行的RL环境和基准涉及属性,对象,情况或任务的大致变化。 RL算法在环境特征的分布分布远不那么均匀的世界中如何执行?为了探讨这个问题,我们开发了三个互补的RL环境,在这些环境中,代理商的经验会根据Zipfian(离散幂定律)分布而变化。在这些基准上,我们发现标准的深度RL体系结构和算法获得了对常见情况和任务的有用知识,但无法充分了解稀有的情况。为了更好地理解这一失败,我们探讨了如何调整当前方法的不同方面,以帮助提高罕见事件的性能,并表明RL目标功能,代理商的记忆系统和自我监督的学习目标都可以影响代理商从罕见的经验中学习的能力。这些结果共同表明,从偏斜的经验中进行强大的学习是应用模拟或实验室以外的深度RL方法的关键挑战,而我们的Zipfian环境为衡量未来的进步朝着这一目标提供了基础。
As humans and animals learn in the natural world, they encounter distributions of entities, situations and events that are far from uniform. Typically, a relatively small set of experiences are encountered frequently, while many important experiences occur only rarely. The highly-skewed, heavy-tailed nature of reality poses particular learning challenges that humans and animals have met by evolving specialised memory systems. By contrast, most popular RL environments and benchmarks involve approximately uniform variation of properties, objects, situations or tasks. How will RL algorithms perform in worlds (like ours) where the distribution of environment features is far less uniform? To explore this question, we develop three complementary RL environments where the agent's experience varies according to a Zipfian (discrete power law) distribution. On these benchmarks, we find that standard Deep RL architectures and algorithms acquire useful knowledge of common situations and tasks, but fail to adequately learn about rarer ones. To understand this failure better, we explore how different aspects of current approaches may be adjusted to help improve performance on rare events, and show that the RL objective function, the agent's memory system and self-supervised learning objectives can all influence an agent's ability to learn from uncommon experiences. Together, these results show that learning robustly from skewed experience is a critical challenge for applying Deep RL methods beyond simulations or laboratories, and our Zipfian environments provide a basis for measuring future progress towards this goal.