论文标题
DNN水印中的深层保真度:分类模型后门水印的研究
Deep Fidelity in DNN Watermarking: A Study of Backdoor Watermarking for Classification Models
论文作者
论文摘要
后门水印是保护深神经网络(DNN)模型版权的有前途的范式。在现有有关该主题的作品中,研究人员将重点放在水印上的稳健性上,而忠诚度的概念(涉及该模型的原始功能的保存)受到了较少的关注。在本文中,专注于深层图像分类模型,我们表明,学习准确性唯一测量的现有共享概念不足以表征后门保真度。同时,我们表明,将变形嵌入多媒体水印中的类似概念,被解释为DNN后门水印中的总体重减轻(TWL),对于保真度测量而言也有问题。为了应对这一挑战,我们提出了深度忠诚的概念,该概念指出,后门水印的DNN模型应保留不标记的主机模型的功能表示和决策边界。为了实现深厚的保真度,我们提出了两个损失功能,称为倒数倒数特征损失(PFL)和SoftMax概率分布损失(SPL),以保留特征表示,而决策边界则由提议的Fix上一层(FIXLL)治疗保留,这是受最近发现的启发,该发现的深度学习不会导致固定的分类器损失精确度的损失。通过上述设计,实施了从头开始的嵌入和微调策略,以评估后门嵌入的深度忠诚度,其优势通过使用RESNET18进行MNIST和CIFAR-10分类的实验来验证其优势,用于MNIST和CIFAR-10分类,以及广泛的剩余网络以及CIFAR-11的广泛残留网络(I.E.,WRN28_10)for Cifar-11s。 Pytorch代码可在https://github.com/ghua-ac/dnn_watermark上找到。
Backdoor watermarking is a promising paradigm to protect the copyright of deep neural network (DNN) models. In the existing works on this subject, researchers have intensively focused on watermarking robustness, while the concept of fidelity, which is concerned with the preservation of the model's original functionality, has received less attention. In this paper, focusing on deep image classification models, we show that the existing shared notion of the sole measurement of learning accuracy is inadequate to characterize backdoor fidelity. Meanwhile, we show that the analogous concept of embedding distortion in multimedia watermarking, interpreted as the total weight loss (TWL) in DNN backdoor watermarking, is also problematic for fidelity measurement. To address this challenge, we propose the concept of deep fidelity, which states that the backdoor watermarked DNN model should preserve both the feature representation and decision boundary of the unwatermarked host model. To achieve deep fidelity, we propose two loss functions termed penultimate feature loss (PFL) and softmax probability-distribution loss (SPL) to preserve feature representation, while the decision boundary is preserved by the proposed fix last layer (FixLL) treatment, inspired by the recent discovery that deep learning with a fixed classifier causes no loss of learning accuracy. With the above designs, both embedding from scratch and fine-tuning strategies are implemented to evaluate the deep fidelity of backdoor embedding, whose advantages over the existing methods are verified via experiments using ResNet18 for MNIST and CIFAR-10 classifications, and wide residual network (i.e., WRN28_10) for CIFAR-100 task. PyTorch codes are available at https://github.com/ghua-ac/dnn_watermark.