论文标题
好奇心在政策搜索中创造了多样性
Curiosity creates Diversity in Policy Search
论文作者
论文摘要
在搜索政策时,奖励 - 帕克斯环境通常缺乏足够的信息,以提高或避免哪些行为。在这样的环境中,政策搜索过程必将盲目地寻找收益的过渡,而没有早期的奖励可以朝着一个方向或另一个方向偏向此搜索。克服这一点的一种方法是利用内在动机来探索新的过渡,直到找到奖励为止。在这项工作中,我们在进化政策搜索方法中使用了最近提出的对内在动机,好奇心的定义。我们提出了好奇心-es,这是一种适应好奇心作为健身指标的进化策略。我们将好奇心与新颖性(一种常用的多样性度量标准)进行了比较,发现好奇心可以在完整发作中产生更高的多样性,而无需明确的多样性标准,并导致多种政策获得奖励。
When searching for policies, reward-sparse environments often lack sufficient information about which behaviors to improve upon or avoid. In such environments, the policy search process is bound to blindly search for reward-yielding transitions and no early reward can bias this search in one direction or another. A way to overcome this is to use intrinsic motivation in order to explore new transitions until a reward is found. In this work, we use a recently proposed definition of intrinsic motivation, Curiosity, in an evolutionary policy search method. We propose Curiosity-ES, an evolutionary strategy adapted to use Curiosity as a fitness metric. We compare Curiosity with Novelty, a commonly used diversity metric, and find that Curiosity can generate higher diversity over full episodes without the need for an explicit diversity criterion and lead to multiple policies which find reward.