从对称性中学习：具有对称行为和语言指示的元提升学习

论文标题

从对称性中学习：具有对称行为和语言指示的元提升学习

Learning from Symmetry: Meta-Reinforcement Learning with Symmetrical Behaviors and Language Instructions

论文作者

Yao, Xiangtong, Bing, Zhenshan, Zhuang, Genghang, Chen, Kejia, Zhou, Hongkuan, Huang, Kai, Knoll, Alois

论文摘要

元强化学习（META-RL）是一种有前途的方法，它使代理商能够快速学习新任务。但是，由于仅由奖励提供的任务信息不足，大多数元元算法在多任务方案中的概括性差。语言条件的meta-RL通过将语言指令与代理商的行为匹配，从而提高了概括能力。尽管行为和语言指示都具有对称性，这可以加快人类对新知识的学习。因此，将对称性和语言指令结合到元素RL可以帮助提高算法的概括和学习效率。我们提出了一种双MDP元提升学习方法，该方法可以通过对称行为和语言指令有效地学习新任务。我们在多个具有挑战性的操纵任务中评估了我们的方法，实验结果表明，我们的方法可以大大提高元提升学习的概括和学习效率。视频可在https://tumi6robot.wixsite.com/symmetry/上找到。

Meta-reinforcement learning (meta-RL) is a promising approach that enables the agent to learn new tasks quickly. However, most meta-RL algorithms show poor generalization in multi-task scenarios due to the insufficient task information provided only by rewards. Language-conditioned meta-RL improves the generalization capability by matching language instructions with the agent's behaviors. While both behaviors and language instructions have symmetry, which can speed up human learning of new knowledge. Thus, combining symmetry and language instructions into meta-RL can help improve the algorithm's generalization and learning efficiency. We propose a dual-MDP meta-reinforcement learning method that enables learning new tasks efficiently with symmetrical behaviors and language instructions. We evaluate our method in multiple challenging manipulation tasks, and experimental results show that our method can greatly improve the generalization and learning efficiency of meta-reinforcement learning. Videos are available at https://tumi6robot.wixsite.com/symmetry/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题