论文标题

Corella:一种基于相关查询的私人多服务器学习方法

Corella: A Private Multi Server Learning Approach based on Correlated Queries

论文作者

Ehteram, Hamidreza, Maddah-Ali, Mohammad Ali, Mirmohseni, Mahtab

论文摘要

机器学习算法在移动设备上的新兴应用激发了我们卸载训练模型或将训练有素的计算任务卸载到云中或网络边缘。此设置的主要挑战之一是保证客户数据的隐私。已经提出了各种方法来保护文献中的隐私。其中包括(i)在客户数据中添加噪声,从而降低了结果的准确性,(ii)使用安全的多党计算(MPC),这需要在计算节点之间或与客户端之间进行重大通信,(iii)依靠同型加密(HE)方法,这显着增加了在接服务器处的计算负载。在本文中,我们建议$ \ textit {Corella} $作为保护数据隐私的替代方法。所提出的方案依赖于一系列服务器,其中最多最多可以同谋,每个服务器中的$ t \ in \ mathbb {n} $可能会碰撞,每个人都运行一个学习模型(例如,深神经网络)。每个服务器都用客户数据馈送,并添加了$ \ textit {strong} $噪声,独立于用户数据。噪声的差异设置为足够大,可以使信息泄漏到最多$ t $服务器信息从理论上忽略不计的任何子集。另一方面,不同服务器的添加噪声为$ \ textit {color corlyated} $。查询之间的这种相关性允许在不同服务器上运行的模型的参数为$ \ textit {trained} $,从而使客户端可以通过组合服务器的输出来减轻噪声的贡献,并以高精度和少量的计算工作来恢复最终结果。各种数据集的仿真结果表明了使用深层神经网络和自动编码器分别作为监督和无监督的学习任务的分类方法的准确性。

The emerging applications of machine learning algorithms on mobile devices motivate us to offload the computation tasks of training a model or deploying a trained one to the cloud or at the edge of the network. One of the major challenges in this setup is to guarantee the privacy of the client data. Various methods have been proposed to protect privacy in the literature. Those include (i) adding noise to the client data, which reduces the accuracy of the result, (ii) using secure multiparty computation (MPC), which requires significant communication among the computing nodes or with the client, (iii) relying on homomorphic encryption (HE) methods, which significantly increases computation load at the servers. In this paper, we propose $\textit{Corella}$ as an alternative approach to protect the privacy of data. The proposed scheme relies on a cluster of servers, where at most $T \in \mathbb{N}$ of them may collude, each running a learning model (e.g., a deep neural network). Each server is fed with the client data, added with $\textit{strong}$ noise, independent from user data. The variance of the noise is set to be large enough to make the information leakage to any subset of up to $T$ servers information-theoretically negligible. On the other hand, the added noises for different servers are $\textit{correlated}$. This correlation among the queries allows the parameters of the models running on different servers to be $\textit{trained}$ such that the client can mitigate the contribution of the noises by combining the outputs of the servers, and recover the final result with high accuracy and with a minor computational effort. Simulation results for various datasets demonstrate the accuracy of the proposed approach for the classification, using deep neural networks, and the autoencoder, as supervised and unsupervised learning tasks, respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源