主题会员推断攻击联邦学习

论文标题

主题会员推断攻击联邦学习

Subject Membership Inference Attacks in Federated Learning

论文作者

Suri, Anshuman, Kanani, Pallika, Marathe, Virendra J., Peterson, Daniel W.

论文摘要

对机器学习（ML）模型的隐私攻击通常集中于推断培训数据中特定数据点的存在。但是，对手真正想知道的是，培训期间是否包括特定的人（受试者）数据。在这种情况下，对手比实际记录更有可能访问特定主题的分布。此外，在诸如跨索洛联合学习（FL）之类的设置中，一个受试者的数据可以由分布在多个组织中的多个数据记录体现。几乎所有现有的私人FL文献都致力于以两个粒度研究隐私 - 项目级（单个数据记录）和用户级（联邦参与的用户），这两个粒子都不适用于Cross-Silo FL中的数据主体。这种洞察力促使我们将注意力从数据记录的隐私转移到数据主体的隐私，也称为主题级隐私。我们提出了两项针对主题会员推理的新型黑盒攻击，其中一个人在每次训练后都假设访问模型。使用这些攻击，我们估计单党模型和FL场景的现实世界数据的主题会员推理风险。我们发现我们的攻击也非常有效，即使没有获得精确的培训记录，也可以使用少数受试者的会员资格知识。为了更好地理解可能影响跨索洛FL环境中主体隐私风险的各种因素，我们系统地生成了数百个合成联合会配置，数据的不同属性，模型设计和培训以及联邦本身。最后，我们研究了差异隐私在减轻这种威胁方面的有效性。

Privacy attacks on Machine Learning (ML) models often focus on inferring the existence of particular data points in the training data. However, what the adversary really wants to know is if a particular individual's (subject's) data was included during training. In such scenarios, the adversary is more likely to have access to the distribution of a particular subject than actual records. Furthermore, in settings like cross-silo Federated Learning (FL), a subject's data can be embodied by multiple data records that are spread across multiple organizations. Nearly all of the existing private FL literature is dedicated to studying privacy at two granularities -- item-level (individual data records), and user-level (participating user in the federation), neither of which apply to data subjects in cross-silo FL. This insight motivates us to shift our attention from the privacy of data records to the privacy of data subjects, also known as subject-level privacy. We propose two novel black-box attacks for subject membership inference, of which one assumes access to a model after each training round. Using these attacks, we estimate subject membership inference risk on real-world data for single-party models as well as FL scenarios. We find our attacks to be extremely potent, even without access to exact training records, and using the knowledge of membership for a handful of subjects. To better understand the various factors that may influence subject privacy risk in cross-silo FL settings, we systematically generate several hundred synthetic federation configurations, varying properties of the data, model design and training, and the federation itself. Finally, we investigate the effectiveness of Differential Privacy in mitigating this threat.

下载PDF全文

下载文献需遵守相关版权规定

论文标题