通过普遍的对抗扰动，全球深层神经网络的指纹识别

论文标题

通过普遍的对抗扰动，全球深层神经网络的指纹识别

Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations

论文作者

Peng, Zirui, Li, Shaofeng, Chen, Guoxing, Zhang, Cheng, Zhu, Haojin, Xue, Minhui

论文摘要

在本文中，我们提出了一种新颖而实用的机制，使服务提供商能够通过模型提取攻击从受害者模型中窃取可疑模型。我们的关键见解是，DNN模型的决策边界的轮廓可以独特地以其通用对抗性扰动（UAPS）为特征。 UAP属于低维子空间，与非盗版模型相比，盗版模型的子空间与受害者模型的子空间更一致。基于此，我们为DNN模型提出了一种UAP指纹方法，并通过将指纹作为输入的对比度学习训练编码器，输出相似性得分。广泛的研究表明，我们的框架可以在可疑模型的20个指纹中以置信度> 99.99检测模型IP漏洞。它在不同的模型体系结构上具有良好的概括性，并且对被盗模型的后修饰具有鲁棒性。

In this paper, we propose a novel and practical mechanism which enables the service provider to verify whether a suspect model is stolen from the victim model via model extraction attacks. Our key insight is that the profile of a DNN model's decision boundary can be uniquely characterized by its Universal Adversarial Perturbations (UAPs). UAPs belong to a low-dimensional subspace and piracy models' subspaces are more consistent with victim model's subspace compared with non-piracy model. Based on this, we propose a UAP fingerprinting method for DNN models and train an encoder via contrastive learning that takes fingerprint as inputs, outputs a similarity score. Extensive studies show that our framework can detect model IP breaches with confidence > 99.99 within only 20 fingerprints of the suspect model. It has good generalizability across different model architectures and is robust against post-modifications on stolen models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题