论文标题
您所看到的是您得到的:通过分布概括原则上的深度学习
What You See is What You Get: Principled Deep Learning via Distributional Generalization
论文作者
论文摘要
在培训时间和测试时间$ - $中具有类似的行为,我们称之为“您所看到的是您得到的”(Wysiwyg)属性$ - $在机器学习中是可取的。但是,接受标准随机梯度下降(SGD)训练的模型不一定具有此特性,因为它们的复杂行为(例如鲁棒性或亚组性能)在训练时间和测试时间之间可能会大不相同。相比之下,我们表明,差异私人(DP)训练可确保高级Wysiwyg属性,我们使用分布概括的概念对其进行量化。应用此连接,我们引入了新的概念工具来设计深度学习方法,通过将泛化问题减少到优化方面:为了减轻测试时间的不良行为,证明在培训数据上缓解这种行为是足够的。通过应用这种绕过SGD的“病理学”的新型设计原理,我们构建了简单的算法,这些算法在几种分配型应用中与SOTA具有竞争力,显着改善了DP-SGD的隐私影响与不同的影响权衡,并在对抗性训练中缓解了强大的过度锻炼。最后,我们还改善了与DP,稳定性和分布概括有关的理论界限。
Having similar behavior at training time and test time $-$ what we call a "What You See Is What You Get" (WYSIWYG) property $-$ is desirable in machine learning. Models trained with standard stochastic gradient descent (SGD), however, do not necessarily have this property, as their complex behaviors such as robustness or subgroup performance can differ drastically between training and test time. In contrast, we show that Differentially-Private (DP) training provably ensures the high-level WYSIWYG property, which we quantify using a notion of distributional generalization. Applying this connection, we introduce new conceptual tools for designing deep-learning methods by reducing generalization concerns to optimization ones: to mitigate unwanted behavior at test time, it is provably sufficient to mitigate this behavior on the training data. By applying this novel design principle, which bypasses "pathologies" of SGD, we construct simple algorithms that are competitive with SOTA in several distributional-robustness applications, significantly improve the privacy vs. disparate impact trade-off of DP-SGD, and mitigate robust overfitting in adversarial training. Finally, we also improve on theoretical bounds relating DP, stability, and distributional generalization.