使用加固学习的运行时安全保证

论文标题

使用加固学习的运行时安全保证

Runtime Safety Assurance Using Reinforcement Learning

论文作者

Lazarus, Christopher, Lopez, James G., Kochenderfer, Mykel J.

论文摘要

必须验证非望远镜自动驾驶仪的适航性和安全性，但是正式这样做的成本可能会令人望而却步。我们可以通过将运行时安全保证（RTSA）作为确保安全性的机制进行正式验证，以正式验证非胶合组件。 RTSA由一个元控制器组成，该元控制器观察非peDigreed组件的输入和输出，并在系统运行时验证正式指定的行为。触发系统时，部署了经过验证的恢复控制器。恢复控制器的设计目的是安全，但很可能会破坏系统的运行目标，因此RTSA系统必须平衡安全性和效率。本文的目的是设计一个能够以高精度识别不安全情况的元控制器。高维和非线性动力学，其中现代控制器与标称控制器的黑框性质一起部署，这使这是一个困难的问题。当前的方法在很大程度上取决于领域的专业知识和人类工程。我们使用马尔可夫决策过程（MDP）框架将RTSA的设计构架，并使用加固学习（RL）来解决它。与我们的基线人类工程方法相比，我们学到的元控制器在实验中始终表现出卓越的性能。

The airworthiness and safety of a non-pedigreed autopilot must be verified, but the cost to formally do so can be prohibitive. We can bypass formal verification of non-pedigreed components by incorporating Runtime Safety Assurance (RTSA) as mechanism to ensure safety. RTSA consists of a meta-controller that observes the inputs and outputs of a non-pedigreed component and verifies formally specified behavior as the system operates. When the system is triggered, a verified recovery controller is deployed. Recovery controllers are designed to be safe but very likely disruptive to the operational objective of the system, and thus RTSA systems must balance safety and efficiency. The objective of this paper is to design a meta-controller capable of identifying unsafe situations with high accuracy. High dimensional and non-linear dynamics in which modern controllers are deployed along with the black-box nature of the nominal controllers make this a difficult problem. Current approaches rely heavily on domain expertise and human engineering. We frame the design of RTSA with the Markov decision process (MDP) framework and use reinforcement learning (RL) to solve it. Our learned meta-controller consistently exhibits superior performance in our experiments compared to our baseline, human engineered approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题