保护隐私的基于深度学习的记录联系

论文标题

保护隐私的基于深度学习的记录联系

Privacy-preserving Deep Learning based Record Linkage

论文作者

Ranbaduge, Thilina, Vatsalan, Dinusha, Ding, Ming

论文摘要

跨不同数据库的记录的深度链接在数据集成和采矿应用程序中变得越来越有用，以发现来自多个数据源的新见解。但是，由于隐私和机密性问题，组织通常不愿意或不允许与任何外部各方共享其敏感数据，从而使建立/培训深度学习模型在不同组织的数据库中建立/训练深度学习模型。为了克服这一限制，我们提出了第一个基于深度学习的多方隐私记录链接（PPRL）协议，该协议可用于链接由多个不同组织持有的敏感数据库。在我们的方法中，每个数据库所有者首先训练当地的深度学习模型，然后将其上传到安全的环境并牢固地汇总以创建全球模型。然后，链接单元使用全局模型将未标记的记录对作为匹配和不匹配。我们利用差异隐私来实现可证明的隐私保护，以防止重新识别攻击。我们使用几个大型现实世界数据库评估了方法的连锁质量和可扩展性，表明它可以实现高连锁质量，同时为现有攻击提供足够的隐私保护。

Deep learning-based linkage of records across different databases is becoming increasingly useful in data integration and mining applications to discover new insights from multiple sources of data. However, due to privacy and confidentiality concerns, organisations often are not willing or allowed to share their sensitive data with any external parties, thus making it challenging to build/train deep learning models for record linkage across different organizations' databases. To overcome this limitation, we propose the first deep learning-based multi-party privacy-preserving record linkage (PPRL) protocol that can be used to link sensitive databases held by multiple different organisations. In our approach, each database owner first trains a local deep learning model, which is then uploaded to a secure environment and securely aggregated to create a global model. The global model is then used by a linkage unit to distinguish unlabelled record pairs as matches and non-matches. We utilise differential privacy to achieve provable privacy protection against re-identification attacks. We evaluate the linkage quality and scalability of our approach using several large real-world databases, showing that it can achieve high linkage quality while providing sufficient privacy protection against existing attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题