使用基于模型的质量多样性与梯度的有效探索

论文标题

使用基于模型的质量多样性与梯度的有效探索

Efficient Exploration using Model-Based Quality-Diversity with Gradients

论文作者

Lim, Bryan, Flageat, Manon, Cully, Antoine

论文摘要

探索是强化学习的主要挑战，尤其是在长骑士，欺骗性和稀疏的回报环境中。对于此类应用，基于人群的方法已被证明有效。诸如质量多样性之类的方法通过鼓励新颖的解决方案并产生各种行为来解决这一问题。但是，这些方法是由无方向的采样（即突变）驱动的，或者在参数空间中使用近似梯度（即进化策略）驱动，这使它们具有很高的样品感知。在本文中，我们提出了一种基于模型的质量多样性方法。它扩展了现有的QD方法，以利用梯度进行有效的利用，并利用想象力的扰动来有效探索。我们的方法通过利用QD算法作为良好的数据生成器来训练深层模型的有效性，同时优化人口的所有成员，以高效地保持绩效和多样性。我们证明，它保持了具有欺骗性奖励的任务的基于人群的方法的不同搜索能力，同时显着提高了其样本效率和解决方案质量。

Exploration is a key challenge in Reinforcement Learning, especially in long-horizon, deceptive and sparse-reward environments. For such applications, population-based approaches have proven effective. Methods such as Quality-Diversity deals with this by encouraging novel solutions and producing a diversity of behaviours. However, these methods are driven by either undirected sampling (i.e. mutations) or use approximated gradients (i.e. Evolution Strategies) in the parameter space, which makes them highly sample-inefficient. In this paper, we propose a model-based Quality-Diversity approach. It extends existing QD methods to use gradients for efficient exploitation and leverage perturbations in imagination for efficient exploration. Our approach optimizes all members of a population simultaneously to maintain both performance and diversity efficiently by leveraging the effectiveness of QD algorithms as good data generators to train deep models. We demonstrate that it maintains the divergent search capabilities of population-based approaches on tasks with deceptive rewards while significantly improving their sample efficiency and quality of solutions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题