安全的加强学习和对比度风险预测

论文标题

安全的加强学习和对比度风险预测

Safe Reinforcement Learning with Contrastive Risk Prediction

论文作者

Zhang, Hanping, Guo, Yuhong

论文摘要

由于违反安全性会导致现实世界机器人应用的严重后果，因此在机器人域中增加了加固学习（RL）的部署，这推动了对增强学习的安全勘探的研究（SAFE RL）。在这项工作中，我们为安全RL提出了一种风险预防培训方法，该方法学习了统计对比分类器，以预测导致不安全状态的州行动对的概率。根据预测的风险概率，我们可以收集风险预防轨迹，并以风险惩罚重塑奖励功能，以诱发安全的RL政策。我们在机器人模拟环境中进行实验。结果表明，所提出的方法与基于最先进的模型方法具有可比性的性能，并且优于常规的模型安全RL方法。

As safety violations can lead to severe consequences in real-world robotic applications, the increasing deployment of Reinforcement Learning (RL) in robotic domains has propelled the study of safe exploration for reinforcement learning (safe RL). In this work, we propose a risk preventive training method for safe RL, which learns a statistical contrastive classifier to predict the probability of a state-action pair leading to unsafe states. Based on the predicted risk probabilities, we can collect risk preventive trajectories and reshape the reward function with risk penalties to induce safe RL policies. We conduct experiments in robotic simulation environments. The results show the proposed approach has comparable performance with the state-of-the-art model-based methods and outperforms conventional model-free safe RL approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题