在本地学习，全球沟通：合作运输的多机器人任务分配的强化学习

论文标题

在本地学习，全球沟通：合作运输的多机器人任务分配的强化学习

Learning Locally, Communicating Globally: Reinforcement Learning of Multi-robot Task Allocation for Cooperative Transport

论文作者

Shibata, Kazuki, Jimbo, Tomohiko, Odashima, Tadashi, Takeshita, Keisuke, Matsubara, Takamitsu

论文摘要

我们考虑使用多机器人系统进行多对象传输的任务分配，其中每个机器人在具有不同权重的多个对象中选择一个对象。现有的集中式方法假设要固定的机器人和任务数量，这与与学习环境不同的方案不适用。同时，现有的分布式方法将机器人和任务的最小数量限制为恒定值，使其适用于各种数量的机器人和任务。但是，他们无法运输重量超过观察物体的机器人负载能力的物体。为了使其适用于具有不同权重不同和未知权重的各种机器人和对象，我们建议使用多代理强化学习进行任务分配的框架。首先，我们引入了一个结构化策略模型，该模型由1）通过全局通信的预先设计的动态任务优先级和2）基于神经网络的分布式策略模型，该模型决定了协调的时机。分布式策略在本地观察结果下对高优先级对象建立共识，并选择合作或独立的行动。然后，通过反复试验和错误通过多代理强化学习来优化策略。如数值模拟所证明的那样，本地学习和全球交流的结构化政策使我们的框架适用于具有不同和未知权重的各种机器人和对象。

We consider task allocation for multi-object transport using a multi-robot system, in which each robot selects one object among multiple objects with different and unknown weights. The existing centralized methods assume the number of robots and tasks to be fixed, which is inapplicable to scenarios that differ from the learning environment. Meanwhile, the existing distributed methods limit the minimum number of robots and tasks to a constant value, making them applicable to various numbers of robots and tasks. However, they cannot transport an object whose weight exceeds the load capacity of robots observing the object. To make it applicable to various numbers of robots and objects with different and unknown weights, we propose a framework using multi-agent reinforcement learning for task allocation. First, we introduce a structured policy model consisting of 1) predesigned dynamic task priorities with global communication and 2) a neural network-based distributed policy model that determines the timing for coordination. The distributed policy builds consensus on the high-priority object under local observations and selects cooperative or independent actions. Then, the policy is optimized by multi-agent reinforcement learning through trial and error. This structured policy of local learning and global communication makes our framework applicable to various numbers of robots and objects with different and unknown weights, as demonstrated by numerical simulations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题