通过分层增强学习在城市交叉口的行为规划

论文标题

通过分层增强学习在城市交叉口的行为规划

Behavior Planning at Urban Intersections through Hierarchical Reinforcement Learning

论文作者

Qiao, Zhiqian, Schneider, Jeff, Dolan, John M.

论文摘要

对于自动驾驶汽车，有效的行为计划对于确保自我汽车的安全至关重要。在许多城市场景中，很难制定足够的一般启发式规则，尤其是对于一些新的人类驾驶员发现困难的挑战性情况。在这项工作中，我们提出了一种基于强化学习（RL）的行为计划结构，该结构能够在模拟的城市环境中使用层次结构执行自动驾驶行为计划。层次结构的应用允许满足行为计划系统的各个层次。我们的算法可以比基于启发式规则的方法进行选修决策的方法更好，例如何时从相反方向接近的车辆之间的左转或由于车道阻塞而接近交叉路口的车道变化或在自我汽车前的延迟。这种行为很难评估为正确或不正确，但是对于某些积极进取的专家驾驶员而言，这种行为有效，快速地应对此类情况。另一方面，与传统的RL方法相比，由于在训练过程中使用了混合奖励机制和启发式探索，因此我们的算法更有效率。结果还表明，所提出的方法比传统的RL方法更快地收敛到最佳策略。

For autonomous vehicles, effective behavior planning is crucial to ensure safety of the ego car. In many urban scenarios, it is hard to create sufficiently general heuristic rules, especially for challenging scenarios that some new human drivers find difficult. In this work, we propose a behavior planning structure based on reinforcement learning (RL) which is capable of performing autonomous vehicle behavior planning with a hierarchical structure in simulated urban environments. Application of the hierarchical structure allows the various layers of the behavior planning system to be satisfied. Our algorithms can perform better than heuristic-rule-based methods for elective decisions such as when to turn left between vehicles approaching from the opposite direction or possible lane-change when approaching an intersection due to lane blockage or delay in front of the ego car. Such behavior is hard to evaluate as correct or incorrect, but for some aggressive expert human drivers handle such scenarios effectively and quickly. On the other hand, compared to traditional RL methods, our algorithm is more sample-efficient, due to the use of a hybrid reward mechanism and heuristic exploration during the training process. The results also show that the proposed method converges to an optimal policy faster than traditional RL methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题