论文标题
奖励预测的聚类
Reward-Predictive Clustering
论文作者
论文摘要
加强学习研究的最新进展表明,在建立可以在复杂任务中超越人类的算法方面的结果令人印象深刻。然而,创建可以在新环境中加速学习的辅助学习系统仍然是一个活跃的研究领域。先前的工作表明,奖励预测性的抽象实现了这一目标,但仅应用于表格设置。在这里,我们提供了一种聚类算法,该算法能够将这种状态抽象应用于深度学习设置,提供了代理输入的压缩表示,以保留预测奖励序列的能力。收敛定理和模拟表明,由此产生的奖励预测深网可以最大程度地压缩代理的输入,从而在高维视觉控制任务中大大加快了学习的速度。此外,我们提出了不同的概括实验,并在哪些条件下可以重新使用预先培训的奖励预测表示网络而无需重新训练以加速学习 - 这是一种系统的系统过分分发转移。
Recent advances in reinforcement-learning research have demonstrated impressive results in building algorithms that can out-perform humans in complex tasks. Nevertheless, creating reinforcement-learning systems that can build abstractions of their experience to accelerate learning in new contexts still remains an active area of research. Previous work showed that reward-predictive state abstractions fulfill this goal, but have only be applied to tabular settings. Here, we provide a clustering algorithm that enables the application of such state abstractions to deep learning settings, providing compressed representations of an agent's inputs that preserve the ability to predict sequences of reward. A convergence theorem and simulations show that the resulting reward-predictive deep network maximally compresses the agent's inputs, significantly speeding up learning in high dimensional visual control tasks. Furthermore, we present different generalization experiments and analyze under which conditions a pre-trained reward-predictive representation network can be re-used without re-training to accelerate learning -- a form of systematic out-of-distribution transfer.