超越统一的Lipschitz条件，以差异性优化

论文标题

超越统一的Lipschitz条件，以差异性优化

Beyond Uniform Lipschitz Condition in Differentially Private Optimization

论文作者

Das, Rudrajit, Kale, Satyen, Xu, Zheng, Zhang, Tong, Sanghavi, Sujay

论文摘要

在均匀的Lipschitzness的简单假设下，即每个样本梯度均匀界限的大多数先前的私有随机梯度下降（DP-SGD）的结果是在均匀Lipschitzness的简单假设下得出的。我们通过假设每样本梯度具有样品依赖性上限，即每样本的Lipschitz常数，从而概括了均匀的Lipschitzness，它们本身可能是无限的。我们提供了针对DP-SGD中选择剪辑标准的原则指导，以使凸的过度参数化设置满足我们的一般版本时，当每样本的Lipschitz常数有限时；具体而言，我们建议仅调整夹子规范，直到值最多到每样本的Lipschitz常数为最小。这在公共数据预先训练的深度网络之上，在私人培训中找到了应用程序。我们通过在8个数据集上的实验验证建议的功效。此外，当Lipschitz常数无绑定但具有有限的矩，即它们是重尾时，我们为DP-SGD提供了DP-SGD的新收敛结果。

Most prior results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i.e., the per-sample gradients are uniformly bounded. We generalize uniform Lipschitzness by assuming that the per-sample gradients have sample-dependent upper bounds, i.e., per-sample Lipschitz constants, which themselves may be unbounded. We provide principled guidance on choosing the clip norm in DP-SGD for convex over-parameterized settings satisfying our general version of Lipschitzness when the per-sample Lipschitz constants are bounded; specifically, we recommend tuning the clip norm only till values up to the minimum per-sample Lipschitz constant. This finds application in the private training of a softmax layer on top of a deep network pre-trained on public data. We verify the efficacy of our recommendation via experiments on 8 datasets. Furthermore, we provide new convergence results for DP-SGD on convex and nonconvex functions when the Lipschitz constants are unbounded but have bounded moments, i.e., they are heavy-tailed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题