防御多目标特洛伊特攻击

论文标题

防御多目标特洛伊特攻击

Defense Against Multi-target Trojan Attacks

论文作者

Harikumar, Haripriya, Rana, Santu, Do, Kien, Gupta, Sunil, Zong, Wei, Susilo, Willy, Venkastesh, Svetha

论文摘要

对基于深度学习的模型的对抗性攻击对当前的AI基础架构构成了重大威胁。其中，特洛伊木马的攻击是最难防御的。在本文中，我们首先引入了Badnet攻击的一种变体，该攻击将Trojan后门引入多个目标类，并允许将触发器放置在图像中的任何位置。前者使其更有效，后者使在物理空间中进行攻击变得非常容易。最新的特洛伊木马检测方法因这种威胁模型而失败。为了防止这种攻击，我们首先引入了一种触发反向工程机制，该机制使用多个图像来恢复各种潜在的触发器。然后，我们通过测量这种回收触发器的可传递性提出了检测机制。特洛伊木马触发器的可传递性将很高，即它们使其他图像也进入同一类。我们研究攻击方法的许多实际优势，然后使用各种图像数据集证明检测性能。实验结果表明，我们方法的出色检测性能超过了最新的。

Adversarial attacks on deep learning-based models pose a significant threat to the current AI infrastructure. Among them, Trojan attacks are the hardest to defend against. In this paper, we first introduce a variation of the Badnet kind of attacks that introduces Trojan backdoors to multiple target classes and allows triggers to be placed anywhere in the image. The former makes it more potent and the latter makes it extremely easy to carry out the attack in the physical space. The state-of-the-art Trojan detection methods fail with this threat model. To defend against this attack, we first introduce a trigger reverse-engineering mechanism that uses multiple images to recover a variety of potential triggers. We then propose a detection mechanism by measuring the transferability of such recovered triggers. A Trojan trigger will have very high transferability i.e. they make other images also go to the same class. We study many practical advantages of our attack method and then demonstrate the detection performance using a variety of image datasets. The experimental results show the superior detection performance of our method over the state-of-the-arts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题