论文标题
改善对抗性鲁棒性的频率正规化
Frequency Regularization for Improving Adversarial Robustness
论文作者
论文摘要
深层神经网络非常容易受到精心制作的人类侵蚀的对抗性扰动的影响。尽管事实证明对抗性训练(AT)是一种有效的防御方法,但我们发现,训练有素的模型在很大程度上依赖于输入的低频内容来进行判断,这考虑了低标准精度。为了缩小AT期间标准精度和稳健精度之间的较大差距,我们研究了清洁和对抗输入之间的频率差,并提出频率正则化(FR)以使光谱域中的输出差对齐。此外,我们发现随机重量平均(SWA),通过使内核在时代上平滑,进一步提高了鲁棒性。在各种防御方案中,我们的方法实现了PGD-20,C \&W和AutoAttack的攻击的最强鲁棒性,在CIFAR-10训练的情况下,没有任何额外的数据。
Deep neural networks are incredibly vulnerable to crafted, human-imperceptible adversarial perturbations. Although adversarial training (AT) has proven to be an effective defense approach, we find that the AT-trained models heavily rely on the input low-frequency content for judgment, accounting for the low standard accuracy. To close the large gap between the standard and robust accuracies during AT, we investigate the frequency difference between clean and adversarial inputs, and propose a frequency regularization (FR) to align the output difference in the spectral domain. Besides, we find Stochastic Weight Averaging (SWA), by smoothing the kernels over epochs, further improves the robustness. Among various defense schemes, our method achieves the strongest robustness against attacks by PGD-20, C\&W and Autoattack, on a WideResNet trained on CIFAR-10 without any extra data.