论文标题
我不想说:在具有可选个人数据的模型中保护用户同意
I Prefer not to Say: Protecting User Consent in Models with Optional Personal Data
论文作者
论文摘要
我们在设置中检查机器学习模型,其中个人可以选择与决策系统共享可选的个人信息,如现代保险定价模型所示。一些用户同意使用其数据,而其他用户则对其进行对象并将其数据持续不公开。在这项工作中,我们表明,不共享数据的决定本身应该被视为应受到保护以尊重用户隐私的信息。该观察结果提出了一个被忽视的问题,即如何确保保护其个人数据的用户因此不会遭受任何缺点。为了解决此问题,我们对仅使用有效用户同意的信息的模型进行了正式的保护要求。这不包括决定共享数据的决定中包含的隐式信息。我们通过提出受保护的用户同意(PUC)的概念来为此问题提供第一个解决方案,我们在保护要求下被证明是最佳的损失。我们观察到,隐私和绩效从根本上彼此矛盾,决策者在尊重用户同意的同时,有可能从其他数据中受益。为了学习符合PUC的模型,我们使用有限的样本收敛保证了一种模型不可吻合的数据增强策略。最后,我们分析了PUC对挑战实际数据集,任务和模型的影响。
We examine machine learning models in a setup where individuals have the choice to share optional personal information with a decision-making system, as seen in modern insurance pricing models. Some users consent to their data being used whereas others object and keep their data undisclosed. In this work, we show that the decision not to share data can be considered as information in itself that should be protected to respect users' privacy. This observation raises the overlooked problem of how to ensure that users who protect their personal data do not suffer any disadvantages as a result. To address this problem, we formalize protection requirements for models which only use the information for which active user consent was obtained. This excludes implicit information contained in the decision to share data or not. We offer the first solution to this problem by proposing the notion of Protected User Consent (PUC), which we prove to be loss-optimal under our protection requirement. We observe that privacy and performance are not fundamentally at odds with each other and that it is possible for a decision maker to benefit from additional data while respecting users' consent. To learn PUC-compliant models, we devise a model-agnostic data augmentation strategy with finite sample convergence guarantees. Finally, we analyze the implications of PUC on challenging real datasets, tasks, and models.