一种深厚的逆增强学习方法，以与上下文相关的奖励路线选择建模

论文标题

一种深厚的逆增强学习方法，以与上下文相关的奖励路线选择建模

A deep inverse reinforcement learning approach to route choice modeling with context-dependent rewards

论文作者

Zhao, Zhan, Liang, Yuebing

论文摘要

路线选择建模是运输计划和需求预测的基本任务。经典方法通常采用具有线性实用程序功能和高级路由特性的离散选择模型（DCM）框架。尽管最近的一些研究开始探索深度学习对于路线选择建模的适用性，但它们仅限于具有相对简单模型体系结构的基于路径的模型，并依靠预定义的选择集。现有的基于链接的模型可以捕获旅行中链接选择的动态性质，而无需选择“选择设置”，但仍然假设线性关系和链接添加功能。为了解决这些问题，这项研究提出了针对基于链接的路线选择建模的一般深层逆增强学习（IRL）框架，该框架能够纳入各种特征（国家，行动和旅行环境）并捕获复杂的关系。具体而言，我们将对抗性IRL模型调整为路由选择问题，以有效地估计上下文依赖的奖励函数，而无需迭代。实验结果基于上海的出租车GPS数据，中国验证了所提出的模型比常规DCM和其他模仿学习基线的出色预测性能，即使是培训数据中看不见的目的地。进一步的分析表明，该模型具有竞争性的计算效率和合理的解释性。拟议的方法为路线选择模型的未来开发提供了一个新的方向。它是一般的，可以适应不同模式和网络的其他路线选择问题。

Route choice modeling is a fundamental task in transportation planning and demand forecasting. Classical methods generally adopt the discrete choice model (DCM) framework with linear utility functions and high-level route characteristics. While several recent studies have started to explore the applicability of deep learning for route choice modeling, they are limited to path-based models with relatively simple model architectures and relying on predefined choice sets. Existing link-based models can capture the dynamic nature of link choices within the trip without the need for choice set generation, but still assume linear relationships and link-additive features. To address these issues, this study proposes a general deep inverse reinforcement learning (IRL) framework for link-based route choice modeling, which is capable of incorporating diverse features (of the state, action and trip context) and capturing complex relationships. Specifically, we adapt an adversarial IRL model to the route choice problem for efficient estimation of context-dependent reward functions without value iteration. Experiment results based on taxi GPS data from Shanghai, China validate the superior prediction performance of the proposed model over conventional DCMs and other imitation learning baselines, even for destinations unseen in the training data. Further analysis show that the model exhibits competitive computational efficiency and reasonable interpretability. The proposed methodology provides a new direction for future development of route choice models. It is general and can be adaptable to other route choice problems across different modes and networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题