通过安全的汇总保证，联合学习有多少隐私？

论文标题

通过安全的汇总保证，联合学习有多少隐私？

How Much Privacy Does Federated Learning with Secure Aggregation Guarantee?

论文作者

Elkordy, Ahmed Roushdy, Zhang, Jiang, Ezzeldin, Yahya H., Psounis, Konstantinos, Avestimehr, Salman

论文摘要

联邦学习（FL）引起了人们对在存储在多个用户中的数据中启用隐私机器学习的兴趣，同时避免将数据移动到偏离设备上。但是，尽管数据永远不会留下用户的设备，但仍然无法保证隐私，因为用户培训数据的重大计算以训练有素的本地模型的形式共享。最近，这些本地模型通过不同的隐私攻击（例如模型反演攻击）构成了实质性的隐私威胁。作为一种补救措施，通过保证服务器只能学习全局汇总模型更新，而不是单个模型更新，从而开发了安全汇总（SA）作为保护佛罗里达隐私的框架。尽管SA确保没有泄漏有关单个模型更新超出汇总模型更新的其他信息，但对SA实际上可以提供多少私密性FL可以正式保证；由于有关单个数据集的信息仍然可以通过在服务器上计算的汇总模型泄漏。在这项工作中，我们对使用SA的FL保证正式隐私进行了首次分析。具体来说，我们使用共同信息（MI）作为量化度量，并涉及有关每个用户数据集的多少信息可以通过汇总模型更新泄漏的上限。当使用FEDSGD聚合算法时，我们的理论界限表明，隐私泄漏量随着SA参与FL的用户数量而线性减少。为了验证我们的理论界限，我们使用MI神经估计量来凭经验评估MNIST和CIFAR10数据集的不同FL设置下的隐私泄漏。我们的实验验证了FEDSGD的理论界限，随着用户数量和本地批量的增长，隐私泄漏的减少，并且随着培训回合的数量，隐私泄漏的增加。

Federated learning (FL) has attracted growing interest for enabling privacy-preserving machine learning on data stored at multiple users while avoiding moving the data off-device. However, while data never leaves users' devices, privacy still cannot be guaranteed since significant computations on users' training data are shared in the form of trained local models. These local models have recently been shown to pose a substantial privacy threat through different privacy attacks such as model inversion attacks. As a remedy, Secure Aggregation (SA) has been developed as a framework to preserve privacy in FL, by guaranteeing the server can only learn the global aggregated model update but not the individual model updates. While SA ensures no additional information is leaked about the individual model update beyond the aggregated model update, there are no formal guarantees on how much privacy FL with SA can actually offer; as information about the individual dataset can still potentially leak through the aggregated model computed at the server. In this work, we perform a first analysis of the formal privacy guarantees for FL with SA. Specifically, we use Mutual Information (MI) as a quantification metric and derive upper bounds on how much information about each user's dataset can leak through the aggregated model update. When using the FedSGD aggregation algorithm, our theoretical bounds show that the amount of privacy leakage reduces linearly with the number of users participating in FL with SA. To validate our theoretical bounds, we use an MI Neural Estimator to empirically evaluate the privacy leakage under different FL setups on both the MNIST and CIFAR10 datasets. Our experiments verify our theoretical bounds for FedSGD, which show a reduction in privacy leakage as the number of users and local batch size grow, and an increase in privacy leakage with the number of training rounds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题