论文标题
通过控制屏障功能的安全逆增强学习
Safe Inverse Reinforcement Learning via Control Barrier Function
论文作者
论文摘要
从示范中学习(LFD)是一种有力的方法,可以使机器人执行新任务,因为对于非动物学家最终用户来说,通常更容易证明所需的技能和机器人有效地从相关数据中学习的技能,而不是为人类提供奖励功能,以使机器人能够通过强化学习(RL)来学习奖励功能(RL)。现代LFD技术(例如,逆增强学习(IRL))中出现了安全问题,就像它们为RL所做的那样;然而,LFD中的安全学习几乎没有受到关注。在敏捷机器人的背景下,由于机器人环境碰撞,机器人人类碰撞以及对机器人的损害的可能性尤其重要。在本文中,我们提出了一个安全的IRL框架CBFIRL,该框架利用控制屏障功能(CBF)来增强IRL政策的安全性。 CBFIRL的核心思想是将受CBF要求启发的损失函数与IRL方法中的目标相结合,这两者均通过梯度下降共同优化。在实验中,我们显示与没有CBF的IRL方法相比,我们的框架性能更安全,即$ \ sim15 \%$和$ \ sim20 \%$ $改进的2D赛车域的难度和$ \ sim 50 \%$改进的3D无人机域的两个级别。
Learning from Demonstration (LfD) is a powerful method for enabling robots to perform novel tasks as it is often more tractable for a non-roboticist end-user to demonstrate the desired skill and for the robot to efficiently learn from the associated data than for a human to engineer a reward function for the robot to learn the skill via reinforcement learning (RL). Safety issues arise in modern LfD techniques, e.g., Inverse Reinforcement Learning (IRL), just as they do for RL; yet, safe learning in LfD has received little attention. In the context of agile robots, safety is especially vital due to the possibility of robot-environment collision, robot-human collision, and damage to the robot. In this paper, we propose a safe IRL framework, CBFIRL, that leverages the Control Barrier Function (CBF) to enhance the safety of the IRL policy. The core idea of CBFIRL is to combine a loss function inspired by CBF requirements with the objective in an IRL method, both of which are jointly optimized via gradient descent. In the experiments, we show our framework performs safer compared to IRL methods without CBF, that is $\sim15\%$ and $\sim20\%$ improvement for two levels of difficulty of a 2D racecar domain and $\sim 50\%$ improvement for a 3D drone domain.