Castell：可扩展的联合概率估计具有局部差异隐私随机的多维数据

论文标题

Castell：可扩展的联合概率估计具有局部差异隐私随机的多维数据

Castell: Scalable Joint Probability Estimation of Multi-dimensional Data Randomized with Local Differential Privacy

论文作者

Kikuchi, Hiroaki

论文摘要

在多维数据上执行随机响应（RR）受到维数的诅咒。随着属性数量的增加，属性值组合数量的指数增长极大地影响了RR估计的计算成本和准确性。在本文中，我们提出了一种新的多维RR方案，该方案将所有属性独立地随机，然后将这些随机矩阵汇总为单个聚合矩阵。然后估计多维关节概率分布。汇总随机矩阵的逆矩阵可以以轻巧的计算成本（即，相对于维度的线性）和可管理的存储要求有效地计算。为了克服准确性的限制，我们提出了两个扩展名，称为{\ em hybrid}和{\ em truncated}方案。最后，我们使用合成和主要的开源数据集进行了实验，用于各种属性，域大小和受访者人数。使用UCI成人数据集的结果给出了{\ em truncated}的估计和真实（2至6路）的联合概率之间的平均距离为0.0099美元，{\ em hybrid}方案的结果为$ 0.0155 $，虽然是$ 0.03 $和0.04 $ $ lopubub，而contention n it ldi-dim--dimens-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art-art。

Performing randomized response (RR) over multi-dimensional data is subject to the curse of dimensionality. As the number of attributes increases, the exponential growth in the number of attribute-value combinations greatly impacts the computational cost and the accuracy of the RR estimates. In this paper, we propose a new multi-dimensional RR scheme that randomizes all attributes independently, and then aggregates these randomization matrices into a single aggregated matrix. The multi-dimensional joint probability distributions are then estimated. The inverse matrix of the aggregated randomization matrix can be computed efficiently at a lightweight computation cost (i.e., linear with respect to dimensionality) and with manageable storage requirements. To overcome the limitation of accuracy, we propose two extensions to the baseline protocol, called {\em hybrid} and {\em truncated} schemes. Finally, we have conducted experiments using synthetic and major open-source datasets for various numbers of attributes, domain sizes, and numbers of respondents. The results using UCI Adult dataset give average distances between the estimated and the real (2 through 6-way) joint probability are $0.0099$ for {\em truncated} and $0.0155$ for {\em hybrid} schemes, whereas they are $0.03$ and $0.04$ for LoPub, which is the state-of-the-art multi-dimensional LDP scheme.

下载PDF全文

下载文献需遵守相关版权规定

论文标题