对抗性噪声是可分开的，对于（几乎）随机神经网络

论文标题

对抗性噪声是可分开的，对于（几乎）随机神经网络

Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks

论文作者

Zhang, Huishuai, Yu, Da, Lu, Yiping, He, Di

论文摘要

通常，通常针对具有特定模型的特定输入而生成的对抗性示例，对于神经网络而言是无处不在的。在本文中，我们揭示了对抗噪声的令人惊讶的属性，即如果配备了相应的标签，则通过一步梯度方法制作的对抗噪声是线性分离的。从理论上讲，我们为具有随机初始化条目的两层网络和神经切线内核设置证明了此属性，其中参数远离初始化。证明的想法是显示标签信息可以有效地反向输入，同时保持线性可分离性。我们的理论和实验证据进一步表明，用训练数据的对抗噪声训练的线性分类器可以很好地对测试数据的对抗噪声进行分类，这表明对抗性噪声实际上将分布扰动注入了原始数据分布。此外，我们从经验上证明，当上述条件受到损害时，而在它们仍然比原始特征更容易分类时，对抗性噪声可能会变得线性分离。

Adversarial examples, which are usually generated for specific inputs with a specific model, are ubiquitous for neural networks. In this paper we unveil a surprising property of adversarial noises when they are put together, i.e., adversarial noises crafted by one-step gradient methods are linearly separable if equipped with the corresponding labels. We theoretically prove this property for a two-layer network with randomly initialized entries and the neural tangent kernel setup where the parameters are not far from initialization. The proof idea is to show the label information can be efficiently backpropagated to the input while keeping the linear separability. Our theory and experimental evidence further show that the linear classifier trained with the adversarial noises of the training data can well classify the adversarial noises of the test data, indicating that adversarial noises actually inject a distributional perturbation to the original data distribution. Furthermore, we empirically demonstrate that the adversarial noises may become less linearly separable when the above conditions are compromised while they are still much easier to classify than original features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题