用于检测到DNN分类器异常输入的一般框架

论文标题

用于检测到DNN分类器异常输入的一般框架

A General Framework For Detecting Anomalous Inputs to DNN Classifiers

论文作者

Raghuram, Jayaram, Chandrasekaran, Varun, Jha, Somesh, Banerjee, Suman

论文摘要

检测异常输入（例如对抗和分布（OOD）输入）对于部署在现实世界应用程序中的分类器（包括深神经网络或DNN）至关重要。虽然先前的工作提出了使用DNN内部层表示的信息来检测此类样本的各种方法，但对于这种检测方法的不同组件，对原则方法缺乏共识。结果，通常将启发式方法和一次性方法用于此问题的不同方面。我们以带有可构型组件的元叠加形式基于内部DNN层表示，提出了一个基于内部DNN层表示的无监督异常检测框架。我们继续根据基于统计测试和异常检测的思想，为元叠加的每个组成部分提出特定的实例。我们在具有强大的对抗性攻击和OOD输入的众所周知的图像分类数据集上评估了提出的方法，包括使用DNN的内部层表示（通常在先前的工作中不考虑）的自适应攻击。与五种最近提供的竞争检测方法的比较证明了我们方法在检测对抗和OOD输入方面的有效性。

Detecting anomalous inputs, such as adversarial and out-of-distribution (OOD) inputs, is critical for classifiers (including deep neural networks or DNNs) deployed in real-world applications. While prior works have proposed various methods to detect such anomalous samples using information from the internal layer representations of a DNN, there is a lack of consensus on a principled approach for the different components of such a detection method. As a result, often heuristic and one-off methods are applied for different aspects of this problem. We propose an unsupervised anomaly detection framework based on the internal DNN layer representations in the form of a meta-algorithm with configurable components. We proceed to propose specific instantiations for each component of the meta-algorithm based on ideas grounded in statistical testing and anomaly detection. We evaluate the proposed methods on well-known image classification datasets with strong adversarial attacks and OOD inputs, including an adaptive attack that uses the internal layer representations of the DNN (often not considered in prior work). Comparisons with five recently-proposed competing detection methods demonstrates the effectiveness of our method in detecting adversarial and OOD inputs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题