论文标题
使用扩散模型的威胁模型不足的对抗防御
Threat Model-Agnostic Adversarial Defense using Diffusion Models
论文作者
论文摘要
深度神经网络(DNN)对不可察觉的恶意扰动高度敏感,称为对抗性攻击。在实际成像和视力应用中发现了这种脆弱性之后,相关的安全问题引起了广泛的研究关注,并且已经开发出许多防御技术。这些防御方法中的大多数都依赖于对抗性训练(AT) - 根据特定威胁模型对图像训练分类网络,该模型定义了允许修改的大小。尽管在带来有希望的结果的情况下,对特定威胁模型的培训未能推广到其他类型的扰动。一种不同的方法利用预处理步骤从受攻击的图像中删除对抗性扰动。在这项工作中,我们遵循后一条路径,旨在开发一种技术,从而导致在威胁模型的各种实现中都有强大的分类器。为此,我们利用了随机生成建模的最新进展,并将其用于从条件分布中进行采样。我们的防御依赖于在受攻击的图像中添加高斯I.I.D噪声,然后是预验证的扩散过程 - 一种在脱氧网络上执行随机迭代过程的体系结构,从而产生了高感知质量的质量。通过在CIFAR-10数据集上进行的大量实验,通过此随机预处理步骤获得的鲁棒性得到了验证,这表明我们的方法在各种威胁模型下都优于领先的防御方法。
Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks. Following the discovery of this vulnerability in real-world imaging and vision applications, the associated safety concerns have attracted vast research attention, and many defense techniques have been developed. Most of these defense methods rely on adversarial training (AT) -- training the classification network on images perturbed according to a specific threat model, which defines the magnitude of the allowed modification. Although AT leads to promising results, training on a specific threat model fails to generalize to other types of perturbations. A different approach utilizes a preprocessing step to remove the adversarial perturbation from the attacked image. In this work, we follow the latter path and aim to develop a technique that leads to robust classifiers across various realizations of threat models. To this end, we harness the recent advances in stochastic generative modeling, and means to leverage these for sampling from conditional distributions. Our defense relies on an addition of Gaussian i.i.d noise to the attacked image, followed by a pretrained diffusion process -- an architecture that performs a stochastic iterative process over a denoising network, yielding a high perceptual quality denoised outcome. The obtained robustness with this stochastic preprocessing step is validated through extensive experiments on the CIFAR-10 dataset, showing that our method outperforms the leading defense methods under various threat models.