对位置感知计划的加强学习

论文标题

对位置感知计划的加强学习

Reinforcement Learning for Location-Aware Scheduling

论文作者

Stavroulakis, Stelios, Sengupta, Biswa

论文摘要

动态调度和资源管理中的最新技术由于能够在更高的时间分辨率中组织和优先级任务，因此在仓库环境中找到了应用程序。作为一种学习范式，深厚的增强学习的兴起使分散的代理人种群能够发现复杂的协调策略。但是，随着观察和动作空间的成倍大大，培训多个代理商同时引入了训练中的许多障碍。在我们的工作中，我们通过实验量化了仓库环境的各个方面（例如，平面图复杂性，有关代理的实时位置的信息，任务并行级别的信息）如何影响性能和执行优先级。为了达到效率，我们提出了对位置感知多代理系统的状态和行动空间的紧凑表示，其中每个代理只了解自我和任务坐标，因此仅对基础马尔可夫决策过程进行部分观察。最后，我们展示了在某些环境中训练的代理如何在完全看不见的设置中保持性能，并将性能降解与平面图几何形状相关联。

Recent techniques in dynamical scheduling and resource management have found applications in warehouse environments due to their ability to organize and prioritize tasks in a higher temporal resolution. The rise of deep reinforcement learning, as a learning paradigm, has enabled decentralized agent populations to discover complex coordination strategies. However, training multiple agents simultaneously introduce many obstacles in training as observation and action spaces become exponentially large. In our work, we experimentally quantify how various aspects of the warehouse environment (e.g., floor plan complexity, information about agents' live location, level of task parallelizability) affect performance and execution priority. To achieve efficiency, we propose a compact representation of the state and action space for location-aware multi-agent systems, wherein each agent has knowledge of only self and task coordinates, hence only partial observability of the underlying Markov Decision Process. Finally, we show how agents trained in certain environments maintain performance in completely unseen settings and also correlate performance degradation with floor plan geometry.

下载PDF全文

下载文献需遵守相关版权规定

论文标题