论文标题
对梯度下降进行修复
Reparametrizing gradient descent
论文作者
论文摘要
在这项工作中,我们提出了一种优化算法,我们称之为规范适应的梯度下降。该算法类似于其他基于梯度的优化算法,例如Adam或Adagrad,因为它适应了每次迭代时随机梯度下降的学习率。但是,规范适应的梯度下降不使用观察到的梯度的统计特性,而是取决于对标准梯度下降更新步骤效果的一阶估计,就像在许多维度中的牛顿 - 拉夫森方法一样。我们的算法也可以与准Newton方法进行比较,但我们寻求根源而不是固定点。寻求根可以通过以下事实证明,对于具有足够容量的模型,通过非负损失函数衡量的模型,根与全球最佳选择一致。这项工作提出了我们使用算法的几项实验。在这些结果中,在回归环境中,它似乎适应规范的下降特别强,但也能够训练分类器。
In this work, we propose an optimization algorithm which we call norm-adapted gradient descent. This algorithm is similar to other gradient-based optimization algorithms like Adam or Adagrad in that it adapts the learning rate of stochastic gradient descent at each iteration. However, rather than using statistical properties of observed gradients, norm-adapted gradient descent relies on a first-order estimate of the effect of a standard gradient descent update step, much like the Newton-Raphson method in many dimensions. Our algorithm can also be compared to quasi-Newton methods, but we seek roots rather than stationary points. Seeking roots can be justified by the fact that for models with sufficient capacity measured by nonnegative loss functions, roots coincide with global optima. This work presents several experiments where we have used our algorithm; in these results, it appears norm-adapted descent is particularly strong in regression settings but is also capable of training classifiers.