关于梯度下降效率低下的注释，即使在R^2中

论文标题

关于梯度下降效率低下的注释，即使在R^2中

Notes on Worst-case Inefficiency of Gradient Descent Even in R^2

论文作者

Zuo, Shiliang

论文摘要

梯度下降是一种流行的优化算法，其在凸面设置中的性能大多是可以理解的。在非凸面设置中，已经表明梯度下降能够渐近地逃脱鞍点并收敛到局部最小化器[Lee等。 al。 2016]。最近的研究还表明，梯度下降的扰动版本足以有效地逃脱鞍点[Jin等。 al。 2015，GE等。 al。 2017]。在本文中，我们显示出负面的结果：梯度下降可能需要指数时间才能逃脱鞍点，并具有非病理学二维功能。尽管我们的重点是理论上的，但我们还进行了实验，以验证我们的理论结果。通过我们的分析，我们证明了随机性对于有效地逃脱鞍点至关重要。

Gradient descent is a popular algorithm in optimization, and its performance in convex settings is mostly well understood. In non-convex settings, it has been shown that gradient descent is able to escape saddle points asymptotically and converge to local minimizers [Lee et. al. 2016]. Recent studies also show a perturbed version of gradient descent is enough to escape saddle points efficiently [Jin et. al. 2015, Ge et. al. 2017]. In this paper we show a negative result: gradient descent may take exponential time to escape saddle points, with non-pathological two dimensional functions. While our focus is theoretical, we also conduct experiments verifying our theoretical result. Through our analysis we demonstrate that stochasticity is essential to escape saddle points efficiently.

下载PDF全文

下载文献需遵守相关版权规定

论文标题