论文标题

重新考虑特洛伊木马触发器的反向工程

Rethinking the Reverse-engineering of Trojan Triggers

论文作者

Wang, Zhenting, Mei, Kai, Ding, Hailun, Zhai, Juan, Ma, Shiqing

论文摘要

深度神经网络容易受到特洛伊木马(或后门)攻击的影响。反向工程方法可以重建触发因素,从而确定受影响的模型。现有的反向工程方法仅考虑输入空间约束,例如输入空间中的触发大小。显然,他们假设触发器是输入空间中的静态模式,并且无法检测具有特征空间触发器(例如图像样式转换)的模型。我们观察到,输入空间和特征空间特洛伊木马都与特征空间超平面相关。基于此观察结果,我们设计了一种新型的反向工程方法,该方法利用特征空间约束到反向工程的特洛伊木马触发器。在四个数据集和七个不同的攻击中的结果表明,我们的解决方案有效地捍卫了输入空间和功能空间特洛伊木马。在Trojaned模型检测和缓解任务中,它的表现优于最先进的反向工程方法和其他类型的防御方法。平均而言,我们方法的检测准确性为93 \%。对于木马缓解措施,我们的方法可以将ASR(攻击成功率)降低为0.26 \%,而Ba(良性准确性)几乎没有变化。我们的代码可以在https://github.com/ru-system-software-and-security/featurere上找到。

Deep Neural Networks are vulnerable to Trojan (or backdoor) attacks. Reverse-engineering methods can reconstruct the trigger and thus identify affected models. Existing reverse-engineering methods only consider input space constraints, e.g., trigger size in the input space. Expressly, they assume the triggers are static patterns in the input space and fail to detect models with feature space triggers such as image style transformations. We observe that both input-space and feature-space Trojans are associated with feature space hyperplanes. Based on this observation, we design a novel reverse-engineering method that exploits the feature space constraint to reverse-engineer Trojan triggers. Results on four datasets and seven different attacks demonstrate that our solution effectively defends both input-space and feature-space Trojans. It outperforms state-of-the-art reverse-engineering methods and other types of defenses in both Trojaned model detection and mitigation tasks. On average, the detection accuracy of our method is 93\%. For Trojan mitigation, our method can reduce the ASR (attack success rate) to only 0.26\% with the BA (benign accuracy) remaining nearly unchanged. Our code can be found at https://github.com/RU-System-Software-and-Security/FeatureRE.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源