论文标题
使用频域中的数据中毒的无形后门攻击
Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain
论文作者
论文摘要
随着深度神经网络(DNN)的广泛应用,后门攻击逐渐引起了人们的关注。后门攻击是阴险的,中毒模型在良性样本上的表现良好,只有在给定特定输入时才会触发,这会导致神经网络产生不正确的输出。最先进的后门攻击工作是通过数据中毒(即攻击者注入中毒样品中的数据集中)实施的,并且用该数据集训练的模型被后门感染。但是,当前研究中使用的大多数触发因素都是在一小部分图像上修补的固定图案,并且通常被明显错误地标记,这很容易被人类或防御方法(例如神经清洁和前哨)检测到。同样,DNN在没有标记的情况下很难学会,因为它们可能会忽略小图案。在本文中,我们提出了一种基于频域的广义后门攻击方法,该方法可以实施后门植入而不会误标记并访问训练过程。它是人类看不见的,能够逃避常用的防御方法。我们在三个数据集(CIFAR-10,STL-10和GTSRB)的无标签和清洁标签案例中评估了我们的方法。结果表明,我们的方法可以在所有任务上实现高攻击成功率(超过90%),而不会在主要任务上进行大量绩效下降。另外,我们评估了用于各种防御措施的方法的旁路性能,包括检测训练数据(即激活聚类),输入的预处理(即过滤),检测输入(即哨兵)和模型的检测(即Neural Cleanse)。实验结果表明,我们的方法对此类防御能力表现出极好的鲁棒性。
With the broad application of deep neural networks (DNNs), backdoor attacks have gradually attracted attention. Backdoor attacks are insidious, and poisoned models perform well on benign samples and are only triggered when given specific inputs, which cause the neural network to produce incorrect outputs. The state-of-the-art backdoor attack work is implemented by data poisoning, i.e., the attacker injects poisoned samples into the dataset, and the models trained with that dataset are infected with the backdoor. However, most of the triggers used in the current study are fixed patterns patched on a small fraction of an image and are often clearly mislabeled, which is easily detected by humans or defense methods such as Neural Cleanse and SentiNet. Also, it's difficult to be learned by DNNs without mislabeling, as they may ignore small patterns. In this paper, we propose a generalized backdoor attack method based on the frequency domain, which can implement backdoor implantation without mislabeling and accessing the training process. It is invisible to human beings and able to evade the commonly used defense methods. We evaluate our approach in the no-label and clean-label cases on three datasets (CIFAR-10, STL-10, and GTSRB) with two popular scenarios (self-supervised learning and supervised learning). The results show our approach can achieve a high attack success rate (above 90%) on all the tasks without significant performance degradation on main tasks. Also, we evaluate the bypass performance of our approach for different kinds of defenses, including the detection of training data (i.e., Activation Clustering), the preprocessing of inputs (i.e., Filtering), the detection of inputs (i.e., SentiNet), and the detection of models (i.e., Neural Cleanse). The experimental results demonstrate that our approach shows excellent robustness to such defenses.