使用卷积剂量差异私人合成医疗数据生成

论文标题

使用卷积剂量差异私人合成医疗数据生成

Differentially Private Synthetic Medical Data Generation using Convolutional GANs

论文作者

Torfi, Amirsina, Fox, Edward A., Reddy, Chandan K.

论文摘要

深度学习模型已经在几种应用问题（例如图像分类和语音处理）中表现出了卓越的性能。但是，使用健康记录数据创建深度学习模型需要解决某些隐私挑战，这为在该领域工作的研究人员带来了独特的担忧。处理此类私人数据问题的一种有效方法是生成可实际可接受的数据质量并相应地提供模型性能的现实合成数据。为了应对这一挑战，我们使用Rényi差异隐私开发了一个差异性私人框架来生成合成数据。我们的方法基于卷积自动编码器和卷积生成对抗网络，以保留生成的合成数据的某些关键特征。此外，我们的模型还可以捕获原始数据中可能存在的时间信息和特征相关性。我们证明，我们的模型在相同的隐私预算下优于现有的最先进模型，使用监督和无监督的设置中的几个公开可用的基准医疗数据集。

Deep learning models have demonstrated superior performance in several application problems, such as image classification and speech processing. However, creating a deep learning model using health record data requires addressing certain privacy challenges that bring unique concerns to researchers working in this domain. One effective way to handle such private data issues is to generate realistic synthetic data that can provide practically acceptable data quality and correspondingly the model performance. To tackle this challenge, we develop a differentially private framework for synthetic data generation using Rényi differential privacy. Our approach builds on convolutional autoencoders and convolutional generative adversarial networks to preserve some of the critical characteristics of the generated synthetic data. In addition, our model can also capture the temporal information and feature correlations that might be present in the original data. We demonstrate that our model outperforms existing state-of-the-art models under the same privacy budget using several publicly available benchmark medical datasets in both supervised and unsupervised settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题