论文标题
通过最大熵模型从聚合数据中学习
Learning from aggregated data with a maximum entropy model
论文作者
论文摘要
汇总数据集,然后注入一些噪声是一种简单而常见的释放私人数据的方法。但是,即使没有噪声,汇总的数据也不是机器学习分类器的合适输入。在这项工作中,我们如何仅通过近似未播放的数据分配的综合数据来显示一个类似于逻辑回归的新模型,如何从逻辑回归中学习。最终的模型是Markov随机场(MRF),我们详细介绍了如何应用,修改和扩展MRF培训算法到我们的设置。最后,我们在几个公共数据集上提供了经验证据,该模型学到的方式可以实现与经过完整未汇总数据训练的逻辑模型相当的性能。
Aggregating a dataset, then injecting some noise, is a simple and common way to release differentially private data.However, aggregated data -- even without noise -- is not an appropriate input for machine learning classifiers.In this work, we show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. The resulting model is a Markov Random Field (MRF), and we detail how to apply, modify and scale a MRF training algorithm to our setting. Finally we present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.