关于凸与01损耗模型之间对抗性示例的可传递性

论文标题

关于凸与01损耗模型之间对抗性示例的可传递性

On the transferability of adversarial examples between convex and 01 loss models

论文作者

Xue, Yunzhe, Xie, Meiyan, Roshan, Usman

论文摘要

在存在异常值的情况下，01损失与凸损耗模型具有不同，更准确的边界。边界的差异是否可以转化为在01损耗和凸模型之间不可转移的对抗示例？我们在本文中通过研究线性01丢失和凸（HEING）损耗模型之间的对抗性示例的可传递性，以及具有符号激活的双层神经网络之间的对抗性示例的可传递性，而01损失与Sigmoid激活和逻辑损失之间。我们首先表明，与凸模型之间相比，白盒对抗示例没有有效地在凸和01损耗之间以及01损耗模型之间传递。由于这种不可转移性，我们看到凸替代模型黑匣子攻击在01损耗上的有效性不如凸模型。有趣的是，我们还看到，01损耗替代模型攻击在凸面和01损耗模型上都是无效的，这主要是由于01损耗模型的非唯一性。我们以示例性展示异常值的存在如何在01和凸损耗模型之间引起不同的决策边界，从而产生不可转让的对手。的确，我们在MNIST上看到，与可能包含异常值的Cifar10和Imagenet上的对手更容易在01损耗和凸模型之间转移。我们以示例性展示01损失的非连续性如何使对手在双层神经网络中不可转移。我们将CIFAR10的特征离散为更像MNIST，并发现它不能提高可传递性，因此表明由于异常值而引起的不同界限更可能是造成不可转移性的原因。由于这种不可转移性，我们表明，具有01损失的双层符号激活网络可以与简单的卷积网络达到鲁棒性。

The 01 loss gives different and more accurate boundaries than convex loss models in the presence of outliers. Could the difference of boundaries translate to adversarial examples that are non-transferable between 01 loss and convex models? We explore this empirically in this paper by studying transferability of adversarial examples between linear 01 loss and convex (hinge) loss models, and between dual layer neural networks with sign activation and 01 loss vs sigmoid activation and logistic loss. We first show that white box adversarial examples do not transfer effectively between convex and 01 loss and between 01 loss models compared to between convex models. As a result of this non-transferability we see that convex substitute model black box attacks are less effective on 01 loss than convex models. Interestingly we also see that 01 loss substitute model attacks are ineffective on both convex and 01 loss models mostly likely due to the non-uniqueness of 01 loss models. We show intuitively by example how the presence of outliers can cause different decision boundaries between 01 and convex loss models which in turn produces adversaries that are non-transferable. Indeed we see on MNIST that adversaries transfer between 01 loss and convex models more easily than on CIFAR10 and ImageNet which are likely to contain outliers. We show intuitively by example how the non-continuity of 01 loss makes adversaries non-transferable in a dual layer neural network. We discretize CIFAR10 features to be more like MNIST and find that it does not improve transferability, thus suggesting that different boundaries due to outliers are more likely the cause of non-transferability. As a result of this non-transferability we show that our dual layer sign activation network with 01 loss can attain robustness on par with simple convolutional networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题