测试受过训练的分类器预测对天然扰动的鲁棒性

论文标题

测试受过训练的分类器预测对天然扰动的鲁棒性

Testing robustness of predictions of trained classifiers against naturally occurring perturbations

论文作者

Scher, Sebastian, Trügler, Andreas

论文摘要

正确量化机器学习模型的鲁棒性是判断其对特定任务的适用性，并最终以产生对它们的信任的核心方面。我们解决了找到个人预测的鲁棒性的问题。我们在理论上和经验实例上都表明，以前提出的基于反事实的方法是不够的，因为它不是确定针对“自然自然”（外部特定对抗性攻击场景）的扰动的有效度量。我们提出了一种灵活的方法，可以为每个应用程序分别对输入数据中的扰动进行模拟。然后将其与概率方法结合使用，该方法计算``现实世界''扰动将改变预测的可能性，从而提供了训练有素的机器学习模型的单个预测的鲁棒性的定量信息。该方法不需要访问分类器的内部，因此原则上适用于任何黑框模型。但是，它基于蒙特卡洛采样，因此仅适用于尺寸较小的输入空间。我们说明了关于虹膜和电离层数据集的方法，在机场预测雾的应用程序以及可解决的案例上。

Correctly quantifying the robustness of machine learning models is a central aspect in judging their suitability for specific tasks, and ultimately, for generating trust in them. We address the problem of finding the robustness of individual predictions. We show both theoretically and with empirical examples that a method based on counterfactuals that was previously proposed for this is insufficient, as it is not a valid metric for determining the robustness against perturbations that occur ``naturally'', outside specific adversarial attack scenarios. We propose a flexible approach that models possible perturbations in input data individually for each application. This is then combined with a probabilistic approach that computes the likelihood that a ``real-world'' perturbation will change a prediction, thus giving quantitative information of the robustness of individual predictions of the trained machine learning model. The method does not require access to the internals of the classifier and thus in principle works for any black-box model. It is, however, based on Monte-Carlo sampling and thus only suited for input spaces with small dimensions. We illustrate our approach on the Iris and the Ionosphere datasets, on an application predicting fog at an airport, and on analytically solvable cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题