论文标题
通过深度加固学习解决车辆路由问题
Solving the vehicle routing problem with deep reinforcement learning
论文作者
论文摘要
最近,增强学习方法(RL)在NP-HARD组合优化问题上的应用已成为一个流行的话题。从本质上讲,这通常是由于传统组合算法的性质,通常是基于反复试验的过程。 RL旨在自动化此过程。在这方面,本文着重于RL在车辆路由问题(VRP)中的应用,这是属于NP-HARD问题类的著名组合问题。首先,在这项工作中,该问题被建模为Markov决策过程(MDP),然后应用PPO方法(属于Actor-Critic critic crience cense of Greenforce Learning方法)。在第二阶段,已经建立了演员和评论家背后的神经建筑,选择采用基于卷积神经网络的神经建筑,无论是为演员还是评论家。这种选择有效地解决了不同大小的问题。在各种实例上进行的实验表明该算法具有良好的概括能力,并且可以在短时间内达到良好的解决方案。提出的算法与最先进的求解器或最先进的求解器之间的比较表明,后者仍然胜过增强算法。但是,有一些研究观点,旨在升级提出的算法的当前性能。
Recently, the applications of the methodologies of Reinforcement Learning (RL) to NP-Hard Combinatorial optimization problems have become a popular topic. This is essentially due to the nature of the traditional combinatorial algorithms, often based on a trial-and-error process. RL aims at automating this process. At this regard, this paper focuses on the application of RL for the Vehicle Routing Problem (VRP), a famous combinatorial problem that belongs to the class of NP-Hard problems. In this work, first, the problem is modeled as a Markov Decision Process (MDP) and then the PPO method (which belongs to the Actor-Critic class of Reinforcement learning methods) is applied. In a second phase, the neural architecture behind the Actor and Critic has been established, choosing to adopt a neural architecture based on the Convolutional neural networks, both for the Actor and the Critic. This choice resulted in effectively addressing problems of different sizes. Experiments performed on a wide range of instances show that the algorithm has good generalization capabilities and can reach good solutions in a short time. Comparisons between the algorithm proposed and the state-of-the-art solver OR-TOOLS show that the latter still outperforms the Reinforcement learning algorithm. However, there are future research perspectives, that aim to upgrade the current performance of the algorithm proposed.