论文标题
Castell:可扩展的联合概率估计具有局部差异隐私随机的多维数据
Castell: Scalable Joint Probability Estimation of Multi-dimensional Data Randomized with Local Differential Privacy
论文作者
论文摘要
在多维数据上执行随机响应(RR)受到维数的诅咒。随着属性数量的增加,属性值组合数量的指数增长极大地影响了RR估计的计算成本和准确性。在本文中,我们提出了一种新的多维RR方案,该方案将所有属性独立地随机,然后将这些随机矩阵汇总为单个聚合矩阵。然后估计多维关节概率分布。汇总随机矩阵的逆矩阵可以以轻巧的计算成本(即,相对于维度的线性)和可管理的存储要求有效地计算。 为了克服准确性的限制,我们提出了两个扩展名,称为{\ em hybrid}和{\ em truncated}方案。最后,我们使用合成和主要的开源数据集进行了实验,用于各种属性,域大小和受访者人数。使用UCI成人数据集的结果给出了{\ em truncated}的估计和真实(2至6路)的联合概率之间的平均距离为0.0099美元,{\ em hybrid}方案的结果为$ 0.0155 $,虽然是$ 0.03 $和0.04 $ $ lopubub,而contention n it ldi-dim--dimens-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art。
Performing randomized response (RR) over multi-dimensional data is subject to the curse of dimensionality. As the number of attributes increases, the exponential growth in the number of attribute-value combinations greatly impacts the computational cost and the accuracy of the RR estimates. In this paper, we propose a new multi-dimensional RR scheme that randomizes all attributes independently, and then aggregates these randomization matrices into a single aggregated matrix. The multi-dimensional joint probability distributions are then estimated. The inverse matrix of the aggregated randomization matrix can be computed efficiently at a lightweight computation cost (i.e., linear with respect to dimensionality) and with manageable storage requirements. To overcome the limitation of accuracy, we propose two extensions to the baseline protocol, called {\em hybrid} and {\em truncated} schemes. Finally, we have conducted experiments using synthetic and major open-source datasets for various numbers of attributes, domain sizes, and numbers of respondents. The results using UCI Adult dataset give average distances between the estimated and the real (2 through 6-way) joint probability are $0.0099$ for {\em truncated} and $0.0155$ for {\em hybrid} schemes, whereas they are $0.03$ and $0.04$ for LoPub, which is the state-of-the-art multi-dimensional LDP scheme.