论文标题

可逆的表格剂:用Onestone杀死两只鸟以进行表格数据综合

Invertible Tabular GANs: Killing Two Birds with OneStone for Tabular Data Synthesis

论文作者

Lee, Jaehoon, Hyeong, Jihyeon, Jeon, Jinsung, Park, Noseong, Cho, Jihoon

论文摘要

表格数据合成在文献中受到了广泛的关注。这是因为可用的数据通常受到限制,不完整或无法轻松获得,并且数据隐私变得越来越重要。在这项工作中,我们提出了一个标记合成的广义GAN框架,该框架结合了gan的对抗训练和可逆神经网络的负对数密度正则化。所提出的框架可用于两个独特的目标。首先,我们可以通过在对抗性训练过程中降低真实记录的负模密度来进一步提高综合质量。另一方面,通过增加真实记录的负日志密度,可以以一种与真实记录接近的方式合成现实的假记录,并减少潜在信息泄漏的机会。我们使用现实世界数据集进行实验,以进行分类,回归和隐私攻击。通常,提出的方法证明了最佳的合成质量(就任务导向评估指标而言,例如F1),当降低对抗性训练期间的负模密度时。如果增加对数密度的负数,我们的实验结果表明,真实记录和虚假记录之间的距离会增加,从而增强对隐私攻击的鲁棒性。

Tabular data synthesis has received wide attention in the literature. This is because available data is often limited, incomplete, or cannot be obtained easily, and data privacy is becoming increasingly important. In this work, we present a generalized GAN framework for tabular synthesis, which combines the adversarial training of GANs and the negative log-density regularization of invertible neural networks. The proposed framework can be used for two distinctive objectives. First, we can further improve the synthesis quality, by decreasing the negative log-density of real records in the process of adversarial training. On the other hand, by increasing the negative log-density of real records, realistic fake records can be synthesized in a way that they are not too much close to real records and reduce the chance of potential information leakage. We conduct experiments with real-world datasets for classification, regression, and privacy attacks. In general, the proposed method demonstrates the best synthesis quality (in terms of task-oriented evaluation metrics, e.g., F1) when decreasing the negative log-density during the adversarial training. If increasing the negative log-density, our experimental results show that the distance between real and fake records increases, enhancing robustness against privacy attacks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源