论文标题
部分可观测时空混沌系统的无模型预测
Guidance Through Surrogate: Towards a Generic Diagnostic Attack
论文作者
论文摘要
对抗训练是一种有效的方法,可以使深层神经网络强大地抵抗对抗性攻击。最近,提出了不同的对抗训练防御能力,不仅保持高清洁精度,而且还表现出对流行和研究的对抗性攻击(例如PGD)的明显鲁棒性。如果攻击未能找到对抗性梯度方向,这是一种称为“梯度掩盖”的现象,也可能会出现高对抗性的鲁棒性。在这项工作中,我们分析了标签平滑对对抗训练的影响,这是梯度掩盖的潜在原因之一。然后,我们开发了一种引导机制,以避免在攻击优化期间避免局部最小值,从而导致一种新颖的攻击被称为指导的投影梯度攻击(G-PGA)。我们的攻击方法基于“匹配和欺骗”损失,该损失通过替代模型的指导找到了最佳的对抗方向。我们修改的攻击不需要随机重新启动,大量攻击迭代或寻找最佳的步进大小。此外,我们提出的G-PGA是通用的,因此可以将其与合奏攻击策略相结合,因为我们为自动攻击的情况展示,从而导致效率和收敛速度提高。除了有效的攻击之外,G-PGA还可以用作诊断工具,以揭示由于对抗防御的梯度掩盖而引起的难以捉摸的鲁棒性。
Adversarial training is an effective approach to make deep neural networks robust against adversarial attacks. Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD. High adversarial robustness can also arise if an attack fails to find adversarial gradient directions, a phenomenon known as `gradient masking'. In this work, we analyse the effect of label smoothing on adversarial training as one of the potential causes of gradient masking. We then develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA). Our attack approach is based on a `match and deceive' loss that finds optimal adversarial directions through guidance from a surrogate model. Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size. Furthermore, our proposed G-PGA is generic, thus it can be combined with an ensemble attack strategy as we demonstrate for the case of Auto-Attack, leading to efficiency and convergence speed improvements. More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.