论文标题
使用图形卷积网络奖励传播
Reward Propagation Using Graph Convolutional Networks
论文作者
论文摘要
基于潜在的奖励成型为设计良好的奖励功能提供了一种方法,目的是加快学习。但是,自动为复杂环境找到潜在的功能是一个困难的问题(实际上,与从头开始学习价值函数的难度相同)。我们通过利用图表表示的思想来提出一个新的框架来学习潜在功能。我们的方法依赖于图形卷积网络,我们将其用作关键要素与强化学习的概率推理相结合。更确切地说,我们利用图形卷积网络来执行来自奖励状态的消息。然后,传播的消息可以用作奖励成型以加速学习的潜在功能。我们从经验上验证我们的方法可以在小维控制问题和高维控制问题上取得可观的改善。
Potential-based reward shaping provides an approach for designing good reward functions, with the purpose of speeding up learning. However, automatically finding potential functions for complex environments is a difficult problem (in fact, of the same difficulty as learning a value function from scratch). We propose a new framework for learning potential functions by leveraging ideas from graph representation learning. Our approach relies on Graph Convolutional Networks which we use as a key ingredient in combination with the probabilistic inference view of reinforcement learning. More precisely, we leverage Graph Convolutional Networks to perform message passing from rewarding states. The propagated messages can then be used as potential functions for reward shaping to accelerate learning. We verify empirically that our approach can achieve considerable improvements in both small and high-dimensional control problems.