论文标题
基于CE的白色框对抗攻击将使用超级拟合
CE-based white-box adversarial attacks will not work using super-fitting
论文作者
论文摘要
深度神经网络由于其强大的性能而被广泛用于各个领域。但是,最近的研究表明,深度学习模型容易受到对抗性攻击的影响,即,对输入的略有扰动会导致模型获得错误的结果。这对于某些具有高安全性要求的系统尤其危险,因此本文通过使用模型超拟合状态来提高模型的对抗性鲁棒性(即,在对抗性攻击下的准确性),提出了一种新的防御方法。本文在数学上证明了超拟合的有效性,并使模型能够通过最大程度地减少无关类别得分(MUC)来快速达到该状态。从理论上讲,超拟合可以抵抗任何现有的(甚至将来)基于CE的白盒对抗攻击。此外,本文使用各种强大的攻击算法来评估超拟合的对抗性鲁棒性,并将提出的方法与最近会议的近50种防御模型进行了比较。实验结果表明,本文中的超拟合方法可以使训练有素的模型获得最高的对抗性鲁棒性。
Deep neural networks are widely used in various fields because of their powerful performance. However, recent studies have shown that deep learning models are vulnerable to adversarial attacks, i.e., adding a slight perturbation to the input will make the model obtain wrong results. This is especially dangerous for some systems with high-security requirements, so this paper proposes a new defense method by using the model super-fitting state to improve the model's adversarial robustness (i.e., the accuracy under adversarial attacks). This paper mathematically proves the effectiveness of super-fitting and enables the model to reach this state quickly by minimizing unrelated category scores (MUCS). Theoretically, super-fitting can resist any existing (even future) CE-based white-box adversarial attacks. In addition, this paper uses a variety of powerful attack algorithms to evaluate the adversarial robustness of super-fitting, and the proposed method is compared with nearly 50 defense models from recent conferences. The experimental results show that the super-fitting method in this paper can make the trained model obtain the highest adversarial robustness.