良性和对抗训练下神经切线内核的演变

论文标题

良性和对抗训练下神经切线内核的演变

Evolution of Neural Tangent Kernels under Benign and Adversarial Training

论文作者

Loo, Noel, Hasani, Ramin, Amini, Alexander, Rus, Daniela

论文摘要

现代深度学习面临的两个关键挑战是减轻深层网络对对抗性攻击的脆弱性，并了解深度学习的概括能力。对于第一个问题，已经制定了许多防御策略，最常见的是对抗性训练（AT）。面对第二个挑战，出现的主要理论之一是神经切线内核（NTK） - 无限宽度限制中神经网络行为的表征。在此限制下，内核被冷冻，底层特征映射是固定的。但是，在有限的宽度中，有证据表明特征学习发生在培训的早期阶段（内核学习），然后在内核保持固定的第二阶段（懒惰训练）。虽然先前的工作旨在通过冷冻的无限宽度NTK的镜头研究对抗性脆弱性，但在训练期间，尚无研究研究经验/有限NTK的对抗性鲁棒性。在这项工作中，我们对在标准和对抗训练下经验NTK的演变进行了实证研究，旨在消除对抗性培训对内核学习和懒惰培训的影响。我们发现，在对抗训练下，经验NTK迅速收敛到不同的内核（和特征图），而不是标准培训。这种新内核具有对抗性的鲁棒性，即使在其上进行了非舒适训练。此外，我们发现在固定核之上进行的对抗训练可以在PGD攻击下以$ \ VAREPSILON = 4/255 $在CIFAR-10上的PGD攻击下产生$ 76.1 \％$稳健精度的分类器。

Two key challenges facing modern deep learning are mitigating deep networks' vulnerability to adversarial attacks and understanding deep learning's generalization capabilities. Towards the first issue, many defense strategies have been developed, with the most common being Adversarial Training (AT). Towards the second challenge, one of the dominant theories that has emerged is the Neural Tangent Kernel (NTK) -- a characterization of neural network behavior in the infinite-width limit. In this limit, the kernel is frozen, and the underlying feature map is fixed. In finite widths, however, there is evidence that feature learning happens at the earlier stages of the training (kernel learning) before a second phase where the kernel remains fixed (lazy training). While prior work has aimed at studying adversarial vulnerability through the lens of the frozen infinite-width NTK, there is no work that studies the adversarial robustness of the empirical/finite NTK during training. In this work, we perform an empirical study of the evolution of the empirical NTK under standard and adversarial training, aiming to disambiguate the effect of adversarial training on kernel learning and lazy training. We find under adversarial training, the empirical NTK rapidly converges to a different kernel (and feature map) than standard training. This new kernel provides adversarial robustness, even when non-robust training is performed on top of it. Furthermore, we find that adversarial training on top of a fixed kernel can yield a classifier with $76.1\%$ robust accuracy under PGD attacks with $\varepsilon = 4/255$ on CIFAR-10.

下载PDF全文

下载文献需遵守相关版权规定

论文标题