dpxplain：私下解释汇总查询答案

论文标题

dpxplain：私下解释汇总查询答案

DPXPlain: Privately Explaining Aggregate Query Answers

论文作者

Tao, Yuchao, Gilad, Amir, Machanavajjhala, Ashwin, Roy, Sudeepa

论文摘要

差异隐私（DP）是用于回答汇总数据库查询的最新和严格的隐私概念，同时保留数据中敏感信息的隐私。但是，在当今的数据分析时代，它对用户提出了新的挑战，以了解查询结果中观察到的趋势和异常：由于数据本身而引起的意外答案，还是由于必须添加的额外噪声来保留DP？在第二种情况下，即使用户对查询结果进行的观察可能是错误的。在第一种情况下，我们仍然可以在保护其隐私的同时从敏感数据中挖掘有趣的解释吗？为了应对这些挑战，我们提出了一个三相框架DPXPlain，这是我们最好的知识的第一个系统，可以用DP解释组的总查询答案。 In its three phases, DPXPlain (a) answers a group-by aggregate query with DP, (b) allows users to compare aggregate values of two groups and with high probability assesses whether this comparison holds or is flipped by the DP noise, and (c) eventually provides an explanation table containing the approximately `top-k' explanation predicates along with their relative influences and ranks in the form of confidence intervals, while guaranteeing DP in all步骤。我们对DPXPLAIN进行了广泛的实验分析，并在真实和合成数据上进行了多个用例，表明DPXPLAIN有效地提供了具有良好精度和效用的有见地的解释。

Differential privacy (DP) is the state-of-the-art and rigorous notion of privacy for answering aggregate database queries while preserving the privacy of sensitive information in the data. In today's era of data analysis, however, it poses new challenges for users to understand the trends and anomalies observed in the query results: Is the unexpected answer due to the data itself, or is it due to the extra noise that must be added to preserve DP? In the second case, even the observation made by the users on query results may be wrong. In the first case, can we still mine interesting explanations from the sensitive data while protecting its privacy? To address these challenges, we present a three-phase framework DPXPlain, which is the first system to the best of our knowledge for explaining group-by aggregate query answers with DP. In its three phases, DPXPlain (a) answers a group-by aggregate query with DP, (b) allows users to compare aggregate values of two groups and with high probability assesses whether this comparison holds or is flipped by the DP noise, and (c) eventually provides an explanation table containing the approximately `top-k' explanation predicates along with their relative influences and ranks in the form of confidence intervals, while guaranteeing DP in all steps. We perform an extensive experimental analysis of DPXPlain with multiple use-cases on real and synthetic data showing that DPXPlain efficiently provides insightful explanations with good accuracy and utility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题