样品效率高的多试剂增强学习，并进行羊群控制的演示

论文标题

样品效率高的多试剂增强学习，并进行羊群控制的演示

Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations for Flocking Control

论文作者

Qiu, Yunbo, Zhan, Yuzhu, Jin, Yue, Wang, Jian, Zhang, Xudong

论文摘要

在多机构系统（例如多机构无人驾驶汽车和多机构自动驾驶水下车辆）中，植入控制是一个重大问题，可增强代理的协作性和安全性。与传统方法相反，多机构增强学习（MARL）更灵活地解决了植入控制的问题。但是，基于MARL的方法遇到了样本效率低下的方法，因为它们需要从代理与环境之间的相互作用中收集大量的经验。我们提出了一种新颖的方法，该方法对MARL（PWD-MARL）的示范进行了预处理，该方法可以利用以传统方法预处理剂来提前收集的非专家示范。在预训练的过程中，代理商同时通过MAL和行为克隆从示威中学习政策，并阻止了过度拟合的示威。通过对非专家示范进行预处理，PWD-MARL在温暖的开始中提高了在线MAL的样品效率。实验表明，即使发生不良或很少的示威，PWD-MARL在羊群控制问题中提高了样本效率和政策性能。

Flocking control is a significant problem in multi-agent systems such as multi-agent unmanned aerial vehicles and multi-agent autonomous underwater vehicles, which enhances the cooperativity and safety of agents. In contrast to traditional methods, multi-agent reinforcement learning (MARL) solves the problem of flocking control more flexibly. However, methods based on MARL suffer from sample inefficiency, since they require a huge number of experiences to be collected from interactions between agents and the environment. We propose a novel method Pretraining with Demonstrations for MARL (PwD-MARL), which can utilize non-expert demonstrations collected in advance with traditional methods to pretrain agents. During the process of pretraining, agents learn policies from demonstrations by MARL and behavior cloning simultaneously, and are prevented from overfitting demonstrations. By pretraining with non-expert demonstrations, PwD-MARL improves sample efficiency in the process of online MARL with a warm start. Experiments show that PwD-MARL improves sample efficiency and policy performance in the problem of flocking control, even with bad or few demonstrations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题