快速对抗训练的先前引导的对抗初始化

论文标题

快速对抗训练的先前引导的对抗初始化

Prior-Guided Adversarial Initialization for Fast Adversarial Training

论文作者

Jia, Xiaojun, Zhang, Yong, Wei, Xingxing, Wu, Baoyuan, Ma, Ke, Wang, Jue, Cao, Xiaochun

论文摘要

快速对抗训练（脂肪）有效地提高了标准对抗训练（SAT）的效率。但是，初始脂肪会遇到灾难性的过度拟合，即，突然并大大减少对抗性攻击的稳健精度。尽管有几种脂肪变体毫不费力地防止过度拟合，但它们牺牲了很多计算成本。在本文中，我们探讨了SAT和FAT的训练过程之间的差异，并观察到对抗性示例（AES）脂肪的攻击成功率在晚期训练阶段逐渐变得更糟，从而导致过度拟合。 AE是由具有零或随机初始化的快速梯度标志方法（FGSM）生成的。根据观察结果，我们提出了一种先前的FGSM初始化方法，以避免在研究多种初始化策略后避免过度适应，从而在整个训练过程中提高了AE的质量。初始化是通过利用历史上生成的AE而没有额外计算成本而形成的。我们进一步为提出的初始化方法提供了理论分析。我们还基于先前的初始化，即当前生成的扰动不应过多地偏离先前引导的初始化，我们还提出了一个简单而有效的正规器。常规化器采用历史和当前的对抗性扰动来指导模型学习。在四个数据集上进行的评估表明，所提出的方法可以防止灾难性过度拟合和胜过最先进的脂肪方法。该代码在https://github.com/jiaxiaojunqaq/fgsm-pgi上发布。

Fast adversarial training (FAT) effectively improves the efficiency of standard adversarial training (SAT). However, initial FAT encounters catastrophic overfitting, i.e.,the robust accuracy against adversarial attacks suddenly and dramatically decreases. Though several FAT variants spare no effort to prevent overfitting, they sacrifice much calculation cost. In this paper, we explore the difference between the training processes of SAT and FAT and observe that the attack success rate of adversarial examples (AEs) of FAT gets worse gradually in the late training stage, resulting in overfitting. The AEs are generated by the fast gradient sign method (FGSM) with a zero or random initialization. Based on the observation, we propose a prior-guided FGSM initialization method to avoid overfitting after investigating several initialization strategies, improving the quality of the AEs during the whole training process. The initialization is formed by leveraging historically generated AEs without additional calculation cost. We further provide a theoretical analysis for the proposed initialization method. We also propose a simple yet effective regularizer based on the prior-guided initialization,i.e., the currently generated perturbation should not deviate too much from the prior-guided initialization. The regularizer adopts both historical and current adversarial perturbations to guide the model learning. Evaluations on four datasets demonstrate that the proposed method can prevent catastrophic overfitting and outperform state-of-the-art FAT methods. The code is released at https://github.com/jiaxiaojunQAQ/FGSM-PGI.

下载PDF全文

下载文献需遵守相关版权规定

论文标题