论文标题
匿名机器学习模型
Anonymizing Machine Learning Models
论文作者
论文摘要
分析个人数据以推动业务和隐私问题之间存在已知的张力。许多数据保护法规,包括《欧盟一般数据保护法规》(GDPR)和《加利福尼亚州消费者保护法》(CCPA),规定了对个人数据收集和处理的严格限制和义务。此外,机器学习模型本身可以用于得出个人信息,如最近的成员资格和属性推理攻击所证明的那样。但是,匿名数据免除了这些法规中规定的义务。因此,除了提供更好的保护外,还希望能够创建匿名化的模型,从而使其免受这些义务的限制,从而使他们免于攻击,从而使他们免于这些义务。 对匿名数据学习通常会导致准确性显着降解。在这项工作中,我们提出了一种能够通过使用训练有素的模型中编码的知识,并指导我们的匿名过程来最大程度地减少对模型准确性的影响,这一方法可以实现更好的模型准确性。我们证明,通过关注模型的准确性,而不是通用信息损失度量,我们的方法就实现的实用程序(尤其是k值和大量的准识别器)优于最先进的K匿名方法。 我们还证明,我们的方法具有相似的,有时甚至更好的能力来防止成员推理攻击作为基于差异隐私的方法,同时避免了它们的某些缺点,例如复杂性,性能开销和特定于模型的实现。这使模型引导的匿名化成为此类方法的合法替代品和一种创建隐私模型的实用方法。
There is a known tension between the need to analyze personal data to drive business and privacy concerns. Many data protection regulations, including the EU General Data Protection Regulation (GDPR) and the California Consumer Protection Act (CCPA), set out strict restrictions and obligations on the collection and processing of personal data. Moreover, machine learning models themselves can be used to derive personal information, as demonstrated by recent membership and attribute inference attacks. Anonymized data, however, is exempt from the obligations set out in these regulations. It is therefore desirable to be able to create models that are anonymized, thus also exempting them from those obligations, in addition to providing better protection against attacks. Learning on anonymized data typically results in significant degradation in accuracy. In this work, we propose a method that is able to achieve better model accuracy by using the knowledge encoded within the trained model, and guiding our anonymization process to minimize the impact on the model's accuracy, a process we call accuracy-guided anonymization. We demonstrate that by focusing on the model's accuracy rather than generic information loss measures, our method outperforms state of the art k-anonymity methods in terms of the achieved utility, in particular with high values of k and large numbers of quasi-identifiers. We also demonstrate that our approach has a similar, and sometimes even better ability to prevent membership inference attacks as approaches based on differential privacy, while averting some of their drawbacks such as complexity, performance overhead and model-specific implementations. This makes model-guided anonymization a legitimate substitute for such methods and a practical approach to creating privacy-preserving models.