论文标题

差异化综合数据:应用的评估和增强功能

Differentially Private Synthetic Data: Applied Evaluations and Enhancements

论文作者

Rosenblatt, Lucas, Liu, Xiaoyan, Pouyanfar, Samira, de Leon, Eduardo, Desai, Anuj, Allen, Joshua

论文摘要

机器学习从业人员经常寻求利用最有用的可用数据,而不会在建立预测模型时侵犯数据所有者的隐私。差异化数据合成可保护个人详细信息免受暴露范围,并允许在私人生成的数据集中培训差异化机器学习模型。但是,我们如何有效地评估差异私有合成数据的功效?在本文中,我们调查了数据综合的四个差异性私有生成对抗网络。我们在五个标准表格数据集和两个应用行业方案中对每个标准数据集进行了大规模评估。我们用最近的文献和其他标准机器学习工具的新颖指标进行基准测试。我们的结果表明,某些合成器更适用于不同的隐私预算,我们进一步证明了基于域的折衷在选择一种方法时。我们为研究人员和从事的研究人员和实践者提供了有关应用机器学习方案的实验学习。此外,我们提出了Quail,Quail是一种基于集合的建模方法来生成合成数据。我们检查了鹌鹑的权衡,并注意到在相同的预算限制下,它优于基线差异私人监督学习模型的情况。

Machine learning practitioners frequently seek to leverage the most informative available data, without violating the data owner's privacy, when building predictive models. Differentially private data synthesis protects personal details from exposure, and allows for the training of differentially private machine learning models on privately generated datasets. But how can we effectively assess the efficacy of differentially private synthetic data? In this paper, we survey four differentially private generative adversarial networks for data synthesis. We evaluate each of them at scale on five standard tabular datasets, and in two applied industry scenarios. We benchmark with novel metrics from recent literature and other standard machine learning tools. Our results suggest some synthesizers are more applicable for different privacy budgets, and we further demonstrate complicating domain-based tradeoffs in selecting an approach. We offer experimental learning on applied machine learning scenarios with private internal data to researchers and practioners alike. In addition, we propose QUAIL, an ensemble-based modeling approach to generating synthetic data. We examine QUAIL's tradeoffs, and note circumstances in which it outperforms baseline differentially private supervised learning models under the same budget constraint.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源