早期停止的动量梯度下降的隐式正规化

论文标题

早期停止的动量梯度下降的隐式正规化

The Implicit Regularization of Momentum Gradient Descent with Early Stopping

论文作者

Wang, Li, Zhou, Yingcong, Fu, Zhiguo

论文摘要

基于梯度的优化引起的隐式正则化的研究是长期的追求。在本文中，我们表征了动量梯度下降（MGD）的隐式正规化，并通过与显式$ \ ell_2 $ regularization（ridge）进行比较，并提早停止。从详细的角度来看，我们在连续的时间视图，所谓的动量梯度流（MGF）中研究MGD，并表明其趋势比梯度下降（GD）更接近山脊[Ali等，2019]，以进行最小二乘回归。此外，我们证明，在校准下，$ t = \ sqrt {2/λ} $，其中$ t $是MGF中的时间参数，而$λ$是Ridge回归中的调整参数，MGF的风险不超过Ridge的1.54倍。特别是，在最佳调整下，MGF到脊的相对贝叶斯风险在1到1.035之间。数值实验强烈支持我们的理论结果。

The study on the implicit regularization induced by gradient-based optimization is a longstanding pursuit. In the present paper, we characterize the implicit regularization of momentum gradient descent (MGD) with early stopping by comparing with the explicit $\ell_2$-regularization (ridge). In details, we study MGD in the continuous-time view, so-called momentum gradient flow (MGF), and show that its tendency is closer to ridge than the gradient descent (GD) [Ali et al., 2019] for least squares regression. Moreover, we prove that, under the calibration $t=\sqrt{2/λ}$, where $t$ is the time parameter in MGF and $λ$ is the tuning parameter in ridge regression, the risk of MGF is no more than 1.54 times that of ridge. In particular, the relative Bayes risk of MGF to ridge is between 1 and 1.035 under the optimal tuning. The numerical experiments support our theoretical results strongly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题