考虑国家学习强化学习特征的顺序性质

论文标题

考虑国家学习强化学习特征的顺序性质

Accounting for the Sequential Nature of States to Learn Features for Reinforcement Learning

论文作者

Michlo, Nathan, Jarvis, Devon, Klein, Richard, James, Steven

论文摘要

在这项工作中，我们研究了导致流行表示学习方法失败的数据的属性。特别是，我们发现在状态没有显着重叠的环境中，变异自动编码器（VAE）无法学习有用的功能。我们在简单的网格世界域中证明了这种失败，然后以公制学习形式提供解决方案。但是，公制学习需要以距离功能的形式进行监督，这在增强学习中不存在。为了克服这一点，我们利用了重放缓冲液中状态的顺序性质，以近似距离度量，并在暂时接近状态在语义上也相似的假设，并提供弱监督信号。我们通过三胞胎损失修改VAE，并证明这种方法能够在标准VAE失败的环境中学习有用的下游任务的有用功能。

In this work, we investigate the properties of data that cause popular representation learning approaches to fail. In particular, we find that in environments where states do not significantly overlap, variational autoencoders (VAEs) fail to learn useful features. We demonstrate this failure in a simple gridworld domain, and then provide a solution in the form of metric learning. However, metric learning requires supervision in the form of a distance function, which is absent in reinforcement learning. To overcome this, we leverage the sequential nature of states in a replay buffer to approximate a distance metric and provide a weak supervision signal, under the assumption that temporally close states are also semantically similar. We modify a VAE with triplet loss and demonstrate that this approach is able to learn useful features for downstream tasks, without additional supervision, in environments where standard VAEs fail.

下载PDF全文

下载文献需遵守相关版权规定

论文标题