通过使用语义信息来改善深神经网络的对抗性鲁棒性

论文标题

通过使用语义信息来改善深神经网络的对抗性鲁棒性

Improving adversarial robustness of deep neural networks by using semantic information

论文作者

Wang, Lina, Tang, Rui, Yue, Yawei, Chen, Xingshu, Wang, Wei, Zhu, Yi, Zeng, Xuemei

论文摘要

深度神经网络（DNN）对对抗性攻击的脆弱性，这是一种攻击，可以通过故意扰乱原始输入来误导最先进的分类器以高度信心进行不正确的分类，这引起了人们对DNN稳健性对此类攻击的担忧。对抗性训练是改善对抗性鲁棒性和针对对抗攻击的第一道防线的主要启发式方法，需要许多逐样计算以增加训练尺寸，并且通常不足以增加整个网络。本文提供了有关对抗性鲁棒性问题的新观点，该观点将焦点从整个网络转移到与给定类别相对应的决策边界附近的区域的关键部分。从这个角度来看，我们提出了一种生成单个但图像不可吻合的对抗扰动的方法，该方法带有语义信息，该语义信息暗示了方向到决策边界上脆弱的部分，并导致输入被错误分类为指定目标。我们称基于这种扰动的对抗性训练为“区域对抗训练”（大鼠），该培训类似于经典的对抗训练，但其区分是增强了相关地区中缺少的语义信息。 MNIST和CIFAR-10数据集的实验结果表明，即使使用训练数据中的一个很小的数据集，这种方法也可以大大提高对抗性的鲁棒性。此外，它可以防御与fgsm对抗攻击，这些攻击与在再培训过程中具有完全不同的模式。

The vulnerability of deep neural networks (DNNs) to adversarial attack, which is an attack that can mislead state-of-the-art classifiers into making an incorrect classification with high confidence by deliberately perturbing the original inputs, raises concerns about the robustness of DNNs to such attacks. Adversarial training, which is the main heuristic method for improving adversarial robustness and the first line of defense against adversarial attacks, requires many sample-by-sample calculations to increase training size and is usually insufficiently strong for an entire network. This paper provides a new perspective on the issue of adversarial robustness, one that shifts the focus from the network as a whole to the critical part of the region close to the decision boundary corresponding to a given class. From this perspective, we propose a method to generate a single but image-agnostic adversarial perturbation that carries the semantic information implying the directions to the fragile parts on the decision boundary and causes inputs to be misclassified as a specified target. We call the adversarial training based on such perturbations "region adversarial training" (RAT), which resembles classical adversarial training but is distinguished in that it reinforces the semantic information missing in the relevant regions. Experimental results on the MNIST and CIFAR-10 datasets show that this approach greatly improves adversarial robustness even using a very small dataset from the training data; moreover, it can defend against FGSM adversarial attacks that have a completely different pattern from the model seen during retraining.

下载PDF全文

下载文献需遵守相关版权规定

论文标题