道路交通枢纽驾驶中深度强化学习的自我意识安全

论文标题

道路交通枢纽驾驶中深度强化学习的自我意识安全

Self-Awareness Safety of Deep Reinforcement Learning in Road Traffic Junction Driving

论文作者

Cao, Zehong, Yun, Jie

论文摘要

自主驾驶一直处于公共利益的最前沿，并且要广泛关注的关注的关键辩论是运输系统的安全。深度强化学习（DRL）已应用于自主驾驶，以提供避免障碍的解决方案。但是，在道路交通交通连接方案中，该车辆通常会从运输环境中获得部分观察，而DRL则需要依靠长期奖励来通过最大化累积奖励来训练可靠的模型，这在探索新行动并在碰撞情况下返回积极的奖励或罚款时可能会承担风险。尽管通常在奖励功能的设计中考虑安全问题，但它们并未完全认为是直接评估DRL算法在自主驾驶中的有效性的关键指标。在这项研究中，我们评估了三个基线DRL模型（DQN，A2C和PPO）的安全性能，并提出了从注意力机制中提出的自我意识模块，以改善在复杂的道路交通连接环境中对异常车辆的安全性评估，例如相互交叉点和诸如诸如collection collision collection：Collision Fartic，freecrision reward，frees-freezing offers，freezing ready，freezing ready，freezing ready＆noter varreation，refrevers and refors，novers inter-reverles，novers，novers，nover varreation，nover vary速率，汇率。我们在训练和测试阶段进行的两个实验结果揭示了基线DRL的安全性能差，而我们提出的自我意识注意-DQN可以显着提高交叉路口和回旋处情景中的安全性能。

Autonomous driving has been at the forefront of public interest, and a pivotal debate to widespread concerns is safety in the transportation system. Deep reinforcement learning (DRL) has been applied to autonomous driving to provide solutions for obstacle avoidance. However, in a road traffic junction scenario, the vehicle typically receives partial observations from the transportation environment, while DRL needs to rely on long-term rewards to train a reliable model by maximising the cumulative rewards, which may take the risk when exploring new actions and returning either a positive reward or a penalty in the case of collisions. Although safety concerns are usually considered in the design of a reward function, they are not fully considered as the critical metric to directly evaluate the effectiveness of DRL algorithms in autonomous driving. In this study, we evaluated the safety performance of three baseline DRL models (DQN, A2C, and PPO) and proposed a self-awareness module from an attention mechanism for DRL to improve the safety evaluation for an anomalous vehicle in a complex road traffic junction environment, such as intersection and roundabout scenarios, based on four metrics: collision rate, success rate, freezing rate, and total reward. Our two experimental results in the training and testing phases revealed the baseline DRL with poor safety performance, while our proposed self-awareness attention-DQN can significantly improve the safety performance in intersection and roundabout scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题