论文标题
用知情的对手重建培训数据
Reconstructing Training Data with Informed Adversaries
论文作者
论文摘要
给定对机器学习模型的访问,对手可以重建模型的培训数据吗?这项工作从一个强大的知情对手的镜头中研究了这个问题,他知道除一个培训数据点。通过实例化具体攻击,我们表明可以在此严格的威胁模型中重建其余数据点是可行的。对于凸模型(例如逻辑回归),重建攻击很简单,可以以封闭形式得出。对于更一般的模型(例如神经网络),我们提出了一种基于培训的攻击策略,该策略是一个重建者网络,该网络以攻击下的模型输入重量,并作为输出目标数据点产生。我们证明了对对MNIST和CIFAR-10训练的图像分类器的攻击的有效性,并系统地研究了标准机器学习管道的哪些因素会影响重建成功。最后,我们从理论上研究了差异隐私足以减轻知情对手的重建攻击。我们的工作提供了有效的重建攻击,模型开发人员可以用来评估以前工作中考虑的一般环境中各个点的记忆(例如生成语言模型或访问培训梯度);它表明,标准模型有能力存储足够的信息,以实现培训数据点的高保真重建;它表明,差异隐私可以在实用程序降级最小的参数示体中成功减轻此类攻击。
Given access to a machine learning model, can an adversary reconstruct the model's training data? This work studies this question from the lens of a powerful informed adversary who knows all the training data points except one. By instantiating concrete attacks, we show it is feasible to reconstruct the remaining data point in this stringent threat model. For convex models (e.g. logistic regression), reconstruction attacks are simple and can be derived in closed-form. For more general models (e.g. neural networks), we propose an attack strategy based on training a reconstructor network that receives as input the weights of the model under attack and produces as output the target data point. We demonstrate the effectiveness of our attack on image classifiers trained on MNIST and CIFAR-10, and systematically investigate which factors of standard machine learning pipelines affect reconstruction success. Finally, we theoretically investigate what amount of differential privacy suffices to mitigate reconstruction attacks by informed adversaries. Our work provides an effective reconstruction attack that model developers can use to assess memorization of individual points in general settings beyond those considered in previous works (e.g. generative language models or access to training gradients); it shows that standard models have the capacity to store enough information to enable high-fidelity reconstruction of training data points; and it demonstrates that differential privacy can successfully mitigate such attacks in a parameter regime where utility degradation is minimal.