论文标题
DUALCF:反事实解释的有效模型提取攻击
DualCF: Efficient Model Extraction Attack from Counterfactual Explanations
论文作者
论文摘要
云服务提供商已经启动了机器学习 - AS-A-Service(MLAAS)平台,以允许用户通过API访问大规模的云模型。除了预测输出外,这些API还可以以更具人为理解的方式提供其他信息,例如反事实解释(CF)。但是,这些额外的信息不可避免地会导致云模型更容易受到提取攻击的影响,旨在窃取云中模型的内部功能。但是,由于云模型的黑盒性质,现有攻击策略在替代模型实现高保真度之前不可避免地需要大量查询。在本文中,我们提出了一种新颖的简单而有效的查询策略,以极大地提高查询效率以窃取分类模型。这是由于我们的观察到,当前的查询策略遭受了决策边界转移问题的损失,该问题是通过将遥远的查询和接近边界CFS进行替代模型培训而引起的。然后,我们提出了DUALCF策略来规避上述问题,这不仅可以通过CF,还可以对CF(CCF)作为替代模型的培训样本成对来实现。在合成数据集和现实世界数据集上进行了广泛而全面的实验评估。实验结果很好地表明,DUALCF可以有效,有效地产生较少查询的高保真模型。
Cloud service providers have launched Machine-Learning-as-a-Service (MLaaS) platforms to allow users to access large-scale cloudbased models via APIs. In addition to prediction outputs, these APIs can also provide other information in a more human-understandable way, such as counterfactual explanations (CF). However, such extra information inevitably causes the cloud models to be more vulnerable to extraction attacks which aim to steal the internal functionality of models in the cloud. Due to the black-box nature of cloud models, however, a vast number of queries are inevitably required by existing attack strategies before the substitute model achieves high fidelity. In this paper, we propose a novel simple yet efficient querying strategy to greatly enhance the querying efficiency to steal a classification model. This is motivated by our observation that current querying strategies suffer from decision boundary shift issue induced by taking far-distant queries and close-to-boundary CFs into substitute model training. We then propose DualCF strategy to circumvent the above issues, which is achieved by taking not only CF but also counterfactual explanation of CF (CCF) as pairs of training samples for the substitute model. Extensive and comprehensive experimental evaluations are conducted on both synthetic and real-world datasets. The experimental results favorably illustrate that DualCF can produce a high-fidelity model with fewer queries efficiently and effectively.