论文标题
安全限制的积极学习
Active Learning with Safety Constraints
论文作者
论文摘要
积极的学习方法在减少学习所需的样本数量方面表现出了巨大的希望。随着自动化学习系统被采用到实时的现实世界决策管道中,越来越重要的是,这种算法的设计考虑到了安全性。在这项工作中,我们研究了在交互式环境中学习最佳安全决定的复杂性。我们将这个问题减少到一个约束的线性匪徒问题,我们的目标是找到满足某些(未知)安全限制的最佳手臂。我们提出了一种基于自适应的实验性设计算法,在显示手臂的难度与次优的难度之间,我们表现出了有效的交易。据我们所知,我们的结果是具有安全限制的线性匪徒中最佳武器识别的第一个结果。在实践中,我们证明了这种方法在合成和现实世界数据集上表现良好。
Active learning methods have shown great promise in reducing the number of samples necessary for learning. As automated learning systems are adopted into real-time, real-world decision-making pipelines, it is increasingly important that such algorithms are designed with safety in mind. In this work we investigate the complexity of learning the best safe decision in interactive environments. We reduce this problem to a constrained linear bandits problem, where our goal is to find the best arm satisfying certain (unknown) safety constraints. We propose an adaptive experimental design-based algorithm, which we show efficiently trades off between the difficulty of showing an arm is unsafe vs suboptimal. To our knowledge, our results are the first on best-arm identification in linear bandits with safety constraints. In practice, we demonstrate that this approach performs well on synthetic and real world datasets.