采取轻巧的黑盒攻击对深神经网络

论文标题

采取轻巧的黑盒攻击对深神经网络

Towards Lightweight Black-Box Attacks against Deep Neural Networks

论文作者

Sun, Chenghao, Zhang, Yonggang, Chaoqun, Wan, Wang, Qizhou, Li, Ya, Liu, Tongliang, Han, Bo, Tian, Xinmei

论文摘要

黑框攻击可以生成对抗性示例，而无需访问目标模型的参数，从而在很大程度上加剧了部署的深神经网络（DNNS）的威胁。但是，先前的作品指出，当黑盒攻击无法访问时，黑框攻击不会误导目标模型。在这项工作中，我们认为黑盒攻击可以在这种极为限制的情况下构成实际攻击，在这种情况下，只有几个测试样本可用。具体而言，我们发现，在一些测试样本上训练的DNN浅层较浅的层可以产生强大的对抗例子。由于仅需要几个样本，因此我们将这些攻击称为轻量级黑盒攻击。促进轻质攻击的主要挑战是减轻浅层层近似误差造成的不利影响。由于很难在很少有可用样本的情况下减轻近似错误，因此我们建议使用错误变压器（ETF）来进行轻量级攻击。也就是说，ETF将参数空间中的近似误差转换为特征空间中的扰动，并通过干扰特征来减轻误差。在实验中，带有建议的ETF的轻质黑盒攻击取得了令人惊讶的结果。例如，即使每类仅1个样本，轻质黑框攻击中的攻击成功率也仅比完整训练数据的黑盒攻击低约3％。

Black-box attacks can generate adversarial examples without accessing the parameters of target model, largely exacerbating the threats of deployed deep neural networks (DNNs). However, previous works state that black-box attacks fail to mislead target models when their training data and outputs are inaccessible. In this work, we argue that black-box attacks can pose practical attacks in this extremely restrictive scenario where only several test samples are available. Specifically, we find that attacking the shallow layers of DNNs trained on a few test samples can generate powerful adversarial examples. As only a few samples are required, we refer to these attacks as lightweight black-box attacks. The main challenge to promoting lightweight attacks is to mitigate the adverse impact caused by the approximation error of shallow layers. As it is hard to mitigate the approximation error with few available samples, we propose Error TransFormer (ETF) for lightweight attacks. Namely, ETF transforms the approximation error in the parameter space into a perturbation in the feature space and alleviates the error by disturbing features. In experiments, lightweight black-box attacks with the proposed ETF achieve surprising results. For example, even if only 1 sample per category available, the attack success rate in lightweight black-box attacks is only about 3% lower than that of the black-box attacks with complete training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题