一种基于模型的元强化学习方法：变形金刚和树搜索

论文标题

一种基于模型的元强化学习方法：变形金刚和树搜索

A model-based approach to meta-Reinforcement Learning: Transformers and tree search

论文作者

Pinon, Brieuc, Delvenne, Jean-Charles, Jungers, Raphaël

论文摘要

元学习是一项研究线，它发展了利用过去的经验来有效解决新的学习问题的能力。元强化学习（META-RL）方法证明了学习行为的能力，这些行为有效地获取和利用了几个元RL问题中的信息。在这种情况下，Wang等人提出了炼金术基准。 [2021]。炼金术具有丰富的结构性潜在空间，对最新的无模型RL方法具有挑战性。这些方法无法学会正确探索然后利用。我们开发了一种基于模型的算法。我们训练一个模型，其主要块是适合符号炼金术环境动态的变压器编码器。然后，我们使用树搜索方法定义了通过学习模型的在线计划者。该算法在符号炼金术问题上明显优于先前应用的无模型RL方法。我们的结果揭示了基于模型的方法与在线计划在Meta-RL中成功进行探索和剥削的相关性。此外，我们展示了变压器体系结构的效率，以学习来自元RL问题中存在的潜在空间产生的复杂动态。

Meta-learning is a line of research that develops the ability to leverage past experiences to efficiently solve new learning problems. Meta-Reinforcement Learning (meta-RL) methods demonstrate a capability to learn behaviors that efficiently acquire and exploit information in several meta-RL problems. In this context, the Alchemy benchmark has been proposed by Wang et al. [2021]. Alchemy features a rich structured latent space that is challenging for state-of-the-art model-free RL methods. These methods fail to learn to properly explore then exploit. We develop a model-based algorithm. We train a model whose principal block is a Transformer Encoder to fit the symbolic Alchemy environment dynamics. Then we define an online planner with the learned model using a tree search method. This algorithm significantly outperforms previously applied model-free RL methods on the symbolic Alchemy problem. Our results reveal the relevance of model-based approaches with online planning to perform exploration and exploitation successfully in meta-RL. Moreover, we show the efficiency of the Transformer architecture to learn complex dynamics that arise from latent spaces present in meta-RL problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题