论文标题
为什么对抗训练会伤害强大的精度
Why adversarial training can hurt robust accuracy
论文作者
论文摘要
在对抗性攻击下,具有高测试精度的机器学习分类器通常表现不佳。人们普遍认为,对抗训练可以减轻这个问题。在本文中,我们证明,令人惊讶的是,情况可能是正确的 - 即使在有足够的数据可用时会有所帮助,但在小样本量制度中可能会造成强大的概括性。我们首先证明了这种现象,用于具有无噪声观察的高维线性分类设置。我们的证明提供了解释性的见解,这些见解也可能转移到特征学习模型。此外,我们在标准图像数据集的实验中观察到,对于有效减少类信息(例如掩码攻击和对象损坏)的可感知攻击也会发生相同的行为。
Machine learning classifiers with high test accuracy often perform poorly under adversarial attacks. It is commonly believed that adversarial training alleviates this issue. In this paper, we demonstrate that, surprisingly, the opposite may be true -- Even though adversarial training helps when enough data is available, it may hurt robust generalization in the small sample size regime. We first prove this phenomenon for a high-dimensional linear classification setting with noiseless observations. Our proof provides explanatory insights that may also transfer to feature learning models. Further, we observe in experiments on standard image datasets that the same behavior occurs for perceptible attacks that effectively reduce class information such as mask attacks and object corruptions.