论文标题
捍卫替代模型黑匣子对抗攻击,并以01损失
Defending against substitute model black box adversarial attacks with the 01 loss
论文作者
论文摘要
替代模型黑匣子攻击只能通过访问其输出标签来为目标模型创建对抗性示例。这在实践中,尤其是在安全敏感的应用程序中对机器学习模型构成了重大挑战。已知01损耗模型对离群值和噪声比通常在实践中使用的凸模型更强大。由这些属性激发,我们提出了01损失线性和01损耗双层神经网络模型,作为防御基于转移的替代模型黑匣子攻击的防御。我们比较了针对我们01损耗模型的替代模型黑匣子攻击的对抗性示例的准确性,并在流行的图像基准上进行了二进制分类。我们的01损失双层神经网络的对抗精度为66.2%,58%,60.5%和57%,分别为MNIST,CIFAR10,STL10和Imagenet,而Sigmoid激活的后勤损失的准确性为63.5%,19.3%,19.3%,19.3%,14.9%和27.9%和27.9%和27.6%和27.6%。除MNIST外,凸出的对手的对抗精度大大降低。我们展示了模型的实际应用,以阻止交通标志和面部识别对抗性攻击。在GTSRB街道标志和Celeba面部检测上,我们的01损失网络分别具有34.6%和37.1%的对抗精度,而凸面逻辑对应物的准确性为24%和1.9%。最后,我们证明,我们的01损失网络可以与简单的卷积神经网络相同,即使在使用卷积网络替代模型进行攻击时,也可以比其凸的同步物高得多。我们的工作表明,01损失模型为替代模型黑匣子攻击提供了有力的防御。
Substitute model black box attacks can create adversarial examples for a target model just by accessing its output labels. This poses a major challenge to machine learning models in practice, particularly in security sensitive applications. The 01 loss model is known to be more robust to outliers and noise than convex models that are typically used in practice. Motivated by these properties we present 01 loss linear and 01 loss dual layer neural network models as a defense against transfer based substitute model black box attacks. We compare the accuracy of adversarial examples from substitute model black box attacks targeting our 01 loss models and their convex counterparts for binary classification on popular image benchmarks. Our 01 loss dual layer neural network has an adversarial accuracy of 66.2%, 58%, 60.5%, and 57% on MNIST, CIFAR10, STL10, and ImageNet respectively whereas the sigmoid activated logistic loss counterpart has accuracies of 63.5%, 19.3%, 14.9%, and 27.6%. Except for MNIST the convex counterparts have substantially lower adversarial accuracies. We show practical applications of our models to deter traffic sign and facial recognition adversarial attacks. On GTSRB street sign and CelebA facial detection our 01 loss network has 34.6% and 37.1% adversarial accuracy respectively whereas the convex logistic counterpart has accuracy 24% and 1.9%. Finally we show that our 01 loss network can attain robustness on par with simple convolutional neural networks and much higher than its convex counterpart even when attacked with a convolutional network substitute model. Our work shows that 01 loss models offer a powerful defense against substitute model black box attacks.