论文标题
良性过度良性的良性如何?
How benign is benign overfitting?
论文作者
论文摘要
我们研究了深层神经网络中对抗性脆弱性的两个原因:不良数据和(不良)训练的模型。当接受SGD训练时,深度神经网络基本上实现了零训练错误,即使存在标签噪声,同时也表现出对自然测试数据的良好概括,这被称为良性过度拟合[2,10]。但是,这些模型容易受到对抗性攻击的影响。我们将标签噪声确定为造成对抗性脆弱性的原因之一,并提供理论和经验证据来支持这一点。令人惊讶的是,我们发现了MNIST和CIFAR等数据集中有几个标签噪声的实例,并且训练有素的模型在其中一些训练训练错误,即它们不符合噪声。但是,仅删除嘈杂的标签并不足以实现对抗性鲁棒性。标准培训程序使神经网络偏向学习“简单”分类边界,这可能不如更复杂的培训界限。我们观察到对抗性训练确实会产生更复杂的决策界限。我们推测,部分需要复杂的决策边界,这是次优表示的学习。通过简单的玩具示例,我们从理论上说明表示的选择如何极大地影响对抗性的鲁棒性。
We investigate two causes for adversarial vulnerability in deep neural networks: bad data and (poorly) trained models. When trained with SGD, deep neural networks essentially achieve zero training error, even in the presence of label noise, while also exhibiting good generalization on natural test data, something referred to as benign overfitting [2, 10]. However, these models are vulnerable to adversarial attacks. We identify label noise as one of the causes for adversarial vulnerability, and provide theoretical and empirical evidence in support of this. Surprisingly, we find several instances of label noise in datasets such as MNIST and CIFAR, and that robustly trained models incur training error on some of these, i.e. they don't fit the noise. However, removing noisy labels alone does not suffice to achieve adversarial robustness. Standard training procedures bias neural networks towards learning "simple" classification boundaries, which may be less robust than more complex ones. We observe that adversarial training does produce more complex decision boundaries. We conjecture that in part the need for complex decision boundaries arises from sub-optimal representation learning. By means of simple toy examples, we show theoretically how the choice of representation can drastically affect adversarial robustness.