论文标题
Badhash:带有干净标签的深度散列的无形后门攻击
BadHash: Invisible Backdoor Attacks against Deep Hashing with Clean Label
论文作者
论文摘要
由于具有强大的功能学习能力和高效率,Deep Hashing在大规模图像检索中取得了巨大的成功。同时,广泛的作品表明,深层神经网络(DNN)容易受到对抗的例子,并且探索对深哈希的对抗性攻击吸引了许多研究工作。然而,尚未对DNNS的另一个著名威胁后门攻击进行深入研究。尽管在图像分类领域提出了各种后门攻击,但现有方法未能实现真正的不可思议的后门攻击,该攻击享受着隐形触发器和清洁标签设置,并且也无法满足图像检索后门的内在需求。在本文中,我们提出了Badhash,这是第一个基于生成的无察觉后门攻击对深哈希的攻击,该攻击可以有效地用干净的标签产生隐形和投入特定的中毒图像。具体而言,我们首先提出了一种新的条件生成对抗网络(CGAN)管道,以有效地产生中毒样本。对于任何给定的良性图像,它都试图产生具有独特无形扳机的自然中毒对应物。为了提高攻击效果,我们引入了基于标签的对比学习网络LabCln来利用不同标签的语义特征,随后将其用于混淆和误导目标模型以学习嵌入式触发器。我们终于探索了在哈希空间中对图像检索的后门攻击的机制。在多个基准数据集上进行的广泛实验证明,Badhash可以生成不察觉的中毒样本,具有强大的攻击能力和对最先进的深层哈希方案的可转移性。
Due to its powerful feature learning capability and high efficiency, deep hashing has achieved great success in large-scale image retrieval. Meanwhile, extensive works have demonstrated that deep neural networks (DNNs) are susceptible to adversarial examples, and exploring adversarial attack against deep hashing has attracted many research efforts. Nevertheless, backdoor attack, another famous threat to DNNs, has not been studied for deep hashing yet. Although various backdoor attacks have been proposed in the field of image classification, existing approaches failed to realize a truly imperceptive backdoor attack that enjoys invisible triggers and clean label setting simultaneously, and they also cannot meet the intrinsic demand of image retrieval backdoor. In this paper, we propose BadHash, the first generative-based imperceptible backdoor attack against deep hashing, which can effectively generate invisible and input-specific poisoned images with clean label. Specifically, we first propose a new conditional generative adversarial network (cGAN) pipeline to effectively generate poisoned samples. For any given benign image, it seeks to generate a natural-looking poisoned counterpart with a unique invisible trigger. In order to improve the attack effectiveness, we introduce a label-based contrastive learning network LabCLN to exploit the semantic characteristics of different labels, which are subsequently used for confusing and misleading the target model to learn the embedded trigger. We finally explore the mechanism of backdoor attacks on image retrieval in the hash space. Extensive experiments on multiple benchmark datasets verify that BadHash can generate imperceptible poisoned samples with strong attack ability and transferability over state-of-the-art deep hashing schemes.