论文标题
因果好奇心:RL代理发现自我监督实验的因果代表学习
Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning
论文作者
论文摘要
动物表现出通过互动来学习世界规律的天生能力。通过在其环境中进行实验,他们能够辨别变异的因果因素,并推断出它们如何影响世界动态。受到这一点的启发,我们试图为加强学习剂配备能够进行实验,以促进逐步分类轨迹的分类,并随后以层次结构的方式推断环境的因果因素。我们介绍了一种新颖的内在奖励{\ em因果好奇心},并表明它使我们的代理人可以学习最佳的作用序列并在环境动力学中发现因果因素。学到的行为允许代理在每个环境中推断出地基真实因素的二元量化表示。此外,我们发现这些实验行为在语义上是有意义的(例如,我们的代理人学会提起块以按重量对它们进行分类),并且以一种自我监督的方式学习,数据是数据的2.5倍,而不是传统监督计划者。我们表明,这些行为可以重新使用并进行微调(例如,从提升到推动或其他下游任务)。最后,我们表明,因果因素表示的知识有助于零局部学习,以实现更复杂的任务。请访问https://sites.google.com/usc.edu/causal-curiosity/home for网站。
Animals exhibit an innate ability to learn regularities of the world through interaction. By performing experiments in their environment, they are able to discern the causal factors of variation and infer how they affect the world's dynamics. Inspired by this, we attempt to equip reinforcement learning agents with the ability to perform experiments that facilitate a categorization of the rolled-out trajectories, and to subsequently infer the causal factors of the environment in a hierarchical manner. We introduce {\em causal curiosity}, a novel intrinsic reward, and show that it allows our agents to learn optimal sequences of actions and discover causal factors in the dynamics of the environment. The learned behavior allows the agents to infer a binary quantized representation for the ground-truth causal factors in every environment. Additionally, we find that these experimental behaviors are semantically meaningful (e.g., our agents learn to lift blocks to categorize them by weight), and are learnt in a self-supervised manner with approximately 2.5 times less data than conventional supervised planners. We show that these behaviors can be re-purposed and fine-tuned (e.g., from lifting to pushing or other downstream tasks). Finally, we show that the knowledge of causal factor representations aids zero-shot learning for more complex tasks. Visit https://sites.google.com/usc.edu/causal-curiosity/home for website.