关于随机循环误差的影响及其对梯度下降法与低精确浮点计算的收敛性的影响

论文标题

关于随机循环误差的影响及其对梯度下降法与低精确浮点计算的收敛性的影响

On the influence of stochastic roundoff errors and their bias on the convergence of the gradient descent method with low-precision floating-point computation

论文作者

Xia, Lu, Massei, Stefano, Hochstenbach, Michiel E., Koren, Barry

论文摘要

当以低精度实施梯度下降方法时，随机圆形方案的使用有助于防止因消失梯度效应引起的收敛性停滞。无偏的随机圆形通过保留较小的更新，其概率与其相对幅度成正比，从而产生零偏差。这项研究为低精度计算中梯度下降方法停滞提供了理论上的解释。此外，我们提出了两个新的随机圆形方案，以零偏置特性交易，以保持小梯度的可能性更大。我们的方法产生恒定的圆形偏置，平均而言是下降方向。对于凸问题，我们证明所提出的圆形方法通常对梯度下降的收敛速率具有有益的影响。我们通过比较优化多项式逻辑回归模型的各种圆形方案的性能以及训练具有8位浮点格式的简单神经网络时，通过比较各种圆形方案的性能来验证我们的理论分析。

When implementing the gradient descent method in low precision, the employment of stochastic rounding schemes helps to prevent stagnation of convergence caused by the vanishing gradient effect. Unbiased stochastic rounding yields zero bias by preserving small updates with probabilities proportional to their relative magnitudes. This study provides a theoretical explanation for the stagnation of the gradient descent method in low-precision computation. Additionally, we propose two new stochastic rounding schemes that trade the zero bias property with a larger probability to preserve small gradients. Our methods yield a constant rounding bias that, on average, lies in a descent direction. For convex problems, we prove that the proposed rounding methods typically have a beneficial effect on the convergence rate of gradient descent. We validate our theoretical analysis by comparing the performances of various rounding schemes when optimizing a multinomial logistic regression model and when training a simple neural network with an 8-bit floating-point format.

下载PDF全文

下载文献需遵守相关版权规定

论文标题