几乎确定随机梯度下降和随机重球的收敛速率

论文标题

几乎确定随机梯度下降和随机重球的收敛速率

Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball

论文作者

Sebbouh, Othmane, Gower, Robert M., Defazio, Aaron

论文摘要

我们研究了一般随机近似问题的随机梯度下降（SGD）和随机重球法（SHB，也称为动量方法）。对于SGD，在凸面和平滑设置中，我们提供了第一个\ emph {几乎可以肯定}渐近收敛\ emph {rate}，用于迭代的加权平均值。更确切地说，我们表明，功能值的收敛率是任意接近$ o（1/\ sqrt {k}）$，并且在所谓的过份术情况下正好是$ O（1/k）$。我们表明，当使用随机线搜索和随机polyak步骤尺寸时，这些结果仍然存在，从而在非透明标准的制度中提供了这些方法收敛的第一个证据。使用实质上不同的分析，我们表明这些速率也适用于SHB，但最后迭代。这种区别很重要，因为它是实践中使用的SGD和SHB的最后一个迭代。我们还表明，SHB的最后一个迭代会收敛到最小化器\ emph {几乎可以肯定}。此外，我们证明确定性HB的功能值以$ O（1/k）$ rate的收敛，该功能值比以前已知的$ O（1/k）$快。最后，在非convex设置中，我们在SGD轨迹上证明了最低梯度标准的速率相似。

We study stochastic gradient descent (SGD) and the stochastic heavy ball method (SHB, otherwise known as the momentum method) for the general stochastic approximation problem. For SGD, in the convex and smooth setting, we provide the first \emph{almost sure} asymptotic convergence \emph{rates} for a weighted average of the iterates . More precisely, we show that the convergence rate of the function values is arbitrarily close to $o(1/\sqrt{k})$, and is exactly $o(1/k)$ in the so-called overparametrized case. We show that these results still hold when using stochastic line search and stochastic Polyak stepsizes, thereby giving the first proof of convergence of these methods in the non-overparametrized regime. Using a substantially different analysis, we show that these rates hold for SHB as well, but at the last iterate. This distinction is important because it is the last iterate of SGD and SHB which is used in practice. We also show that the last iterate of SHB converges to a minimizer \emph{almost surely}. Additionally, we prove that the function values of the deterministic HB converge at a $o(1/k)$ rate, which is faster than the previously known $O(1/k)$. Finally, in the nonconvex setting, we prove similar rates on the lowest gradient norm along the trajectory of SGD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题