论文标题
高维中SGD的均质化:精确的动力学和泛化特性
Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties
论文作者
论文摘要
我们开发了一个随机微分方程,称为同质化SGD,用于分析具有$ \ ell^2 $ regularization的高维随机最小二乘问题上的随机梯度下降(SGD)的动力学。我们表明,同质性SGD是SGD的高维等效性 - 对于任何二次统计量(例如,人口风险有二次损失),当SGD迭代的统计量收敛于同质统计量的SGD统计量,当$ n $ n $ d $ d $ d $ d $ d^c < 0 $)。通过分析均质的SGD,我们根据Volterra积分方程的解决方案提供了精确的非质学高维表达式。此外,在经SGD训练时,在二次损失的情况下,我们还提供了限制多余风险的确切价值。该分析是针对满足解决方案条件家族的数据矩阵和目标向量的制定的,该家族可以将其视为数据的样本侧奇异向量的弱(非量化)形式。提供了几种激励应用程序,包括具有独立样本的样本协方差矩阵和具有非基因型模型目标的随机特征。
We develop a stochastic differential equation, called homogenized SGD, for analyzing the dynamics of stochastic gradient descent (SGD) on a high-dimensional random least squares problem with $\ell^2$-regularization. We show that homogenized SGD is the high-dimensional equivalence of SGD -- for any quadratic statistic (e.g., population risk with quadratic loss), the statistic under the iterates of SGD converges to the statistic under homogenized SGD when the number of samples $n$ and number of features $d$ are polynomially related ($d^c < n < d^{1/c}$ for some $c > 0$). By analyzing homogenized SGD, we provide exact non-asymptotic high-dimensional expressions for the generalization performance of SGD in terms of a solution of a Volterra integral equation. Further we provide the exact value of the limiting excess risk in the case of quadratic losses when trained by SGD. The analysis is formulated for data matrices and target vectors that satisfy a family of resolvent conditions, which can roughly be viewed as a weak (non-quantitative) form of delocalization of sample-side singular vectors of the data. Several motivating applications are provided including sample covariance matrices with independent samples and random features with non-generative model targets.