论文标题
特洛伊维特:视觉变压器中的特洛伊木马插入
TrojViT: Trojan Insertion in Vision Transformers
论文作者
论文摘要
视觉变压器(VIT)已证明了各种与视觉相关的任务的最新性能。 VIT的成功激发了对手对VIT进行后门攻击。尽管传统CNN对后门攻击的脆弱性是众所周知的,但很少研究对VIT的后门攻击。与CNN相比,通过卷积捕获像素的本地特征,VIT通过补丁和关注来提取全球上下文信息。天真的移植CNN特异性后门攻击对VIT的攻击只能产生低清洁的数据准确性和低攻击成功率。在本文中,我们提出了隐形和实用的VIT特定后门攻击$ Trojvit $。 Trojvit不是由CNN特异性后门攻击使用的区域触发器,而是生成一个贴片触发器,旨在通过通过贴片可显着排名和攻击目标损失来构建一个由一些易受伤害的trojan组成的特洛伊木马。 Trojvit进一步使用最小调整的参数更新来减少特洛伊木马的位。一旦攻击者通过翻转脆弱的位将特洛伊木马插入VIT模型中,VIT模型仍会通过良性输入产生正常的推理精度。但是,当攻击者将触发器嵌入输入中时,VIT模型被迫将输入分类为预定义的目标类。我们表明,使用众所周知的Rowhammer在VIT模型上鉴定出的DORJVIT识别的少量易受攻击可以将模型转换为后门模型。我们在各种VIT模型上对多个数据集进行了广泛的实验。 Trojvit可以通过翻转Imagenet的VIT $ 345 $位来将$ 99.64 \%的测试图像分类为目标类别。我们的代码可在https://github.com/mxzheng/mxzheng/trojvit上找到。
Vision Transformers (ViTs) have demonstrated the state-of-the-art performance in various vision-related tasks. The success of ViTs motivates adversaries to perform backdoor attacks on ViTs. Although the vulnerability of traditional CNNs to backdoor attacks is well-known, backdoor attacks on ViTs are seldom-studied. Compared to CNNs capturing pixel-wise local features by convolutions, ViTs extract global context information through patches and attentions. Naïvely transplanting CNN-specific backdoor attacks to ViTs yields only a low clean data accuracy and a low attack success rate. In this paper, we propose a stealth and practical ViT-specific backdoor attack $TrojViT$. Rather than an area-wise trigger used by CNN-specific backdoor attacks, TrojViT generates a patch-wise trigger designed to build a Trojan composed of some vulnerable bits on the parameters of a ViT stored in DRAM memory through patch salience ranking and attention-target loss. TrojViT further uses minimum-tuned parameter update to reduce the bit number of the Trojan. Once the attacker inserts the Trojan into the ViT model by flipping the vulnerable bits, the ViT model still produces normal inference accuracy with benign inputs. But when the attacker embeds a trigger into an input, the ViT model is forced to classify the input to a predefined target class. We show that flipping only few vulnerable bits identified by TrojViT on a ViT model using the well-known RowHammer can transform the model into a backdoored one. We perform extensive experiments of multiple datasets on various ViT models. TrojViT can classify $99.64\%$ of test images to a target class by flipping $345$ bits on a ViT for ImageNet.Our codes are available at https://github.com/mxzheng/TrojViT