论文标题
通过自定进度的硬级同对重新加权改善对抗性鲁棒性
Improving Adversarial Robustness with Self-Paced Hard-Class Pair Reweighting
论文作者
论文摘要
深度神经网络容易受到对抗攻击的影响。在许多国防策略中,对攻击的对抗性训练是最有效的方法之一。从理论上讲,可以沿任意方向添加不固定攻击中的对抗性扰动,并且不预定攻击的预测标签应该是不可预测的。但是,我们发现自然不平衡的阶级语义相似性使这些硬级对成为彼此的虚拟目标。这项研究调查了这种紧密耦合的类对对抗攻击的影响,并在对抗性训练中制定了自定进度的重新加权策略。具体来说,我们建议在模型优化中提高重量的硬级对损失,这促使学习障碍类别的歧视特征。我们进一步合并了一个术语,以量化对抗训练中的硬级对一致性,这极大地增强了模型的鲁棒性。广泛的实验表明,拟议的对抗训练方法可在针对广泛的对抗性攻击方面实现优于最先进的防御能力的卓越性能。
Deep Neural Networks are vulnerable to adversarial attacks. Among many defense strategies, adversarial training with untargeted attacks is one of the most effective methods. Theoretically, adversarial perturbation in untargeted attacks can be added along arbitrary directions and the predicted labels of untargeted attacks should be unpredictable. However, we find that the naturally imbalanced inter-class semantic similarity makes those hard-class pairs become virtual targets of each other. This study investigates the impact of such closely-coupled classes on adversarial attacks and develops a self-paced reweighting strategy in adversarial training accordingly. Specifically, we propose to upweight hard-class pair losses in model optimization, which prompts learning discriminative features from hard classes. We further incorporate a term to quantify hard-class pair consistency in adversarial training, which greatly boosts model robustness. Extensive experiments show that the proposed adversarial training method achieves superior robustness performance over state-of-the-art defenses against a wide range of adversarial attacks.