带有不确定性的可及性证书的安全基于模型的增强学习

论文标题

带有不确定性的可及性证书的安全基于模型的增强学习

Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

论文作者

Yu, Dongjie, Zou, Wenjun, Yang, Yujie, Ma, Haitong, Li, Shengbo Eben, Duan, Jingliang, Chen, Jianyu

论文摘要

解决约束可满足性政策的安全加强学习（RL）为RL在实际问题（例如机器人技术）中更广泛的安全性应用程序提供了一种有希望的方法。在所有安全的RL方法中，基于模型的方法由于样本效率很高而进一步降低了训练时间。但是，缺乏对模型不确定性的安全性鲁棒性仍然是基于安全模型的RL的问题，尤其是在培训时间安全方面。在本文中，我们提出了一份分配可及性证书（DRC）及其Bellman方程，以解决模型的不确定性并表征强大的持久安全状态。此外，我们构建了一个安全的RL框架，以解决DRC及其相应盾牌策略所需的约束。我们还设计了一种线路搜索方法，以维持安全性并同时获得更高的回报，同时利用盾牌政策。对经典基准测试（例如受限的跟踪和导航）进行的全面实验表明，拟议的算法在训练过程中获得了可比的回报，违反限制的限制较少。

Safe reinforcement learning (RL) that solves constraint-satisfactory policies provides a promising way to the broader safety-critical applications of RL in real-world problems such as robotics. Among all safe RL approaches, model-based methods reduce training time violations further due to their high sample efficiency. However, lacking safety robustness against the model uncertainties remains an issue in safe model-based RL, especially in training time safety. In this paper, we propose a distributional reachability certificate (DRC) and its Bellman equation to address model uncertainties and characterize robust persistently safe states. Furthermore, we build a safe RL framework to resolve constraints required by the DRC and its corresponding shield policy. We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy. Comprehensive experiments on classical benchmarks such as constrained tracking and navigation indicate that the proposed algorithm achieves comparable returns with much fewer constraint violations during training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题