论文标题
方向偏差有助于随机梯度下降以概括内核回归模型
The Directional Bias Helps Stochastic Gradient Descent to Generalize in Kernel Regression Models
论文作者
论文摘要
我们研究非参数统计中的随机梯度下降(SGD)算法:尤其是内核回归。在线性回归设置中已知的SGD的定向偏置特性已推广到内核回归。更具体地说,我们证明,具有中等和退火的阶梯尺寸的SGD沿特征向量的方向收敛,该方向与革兰氏基质的最大特征值相对应。另外,具有中等或小的尺寸的梯度下降(GD)沿着对应于最小特征值的方向收敛。这些事实称为方向性偏置属性。他们可以解释与GD计算机相比,SGD计算的估计器的概括误差可能更小。我们的理论的应用是通过模拟研究和基于时尚人数数据集的案例研究来证明的。
We study the Stochastic Gradient Descent (SGD) algorithm in nonparametric statistics: kernel regression in particular. The directional bias property of SGD, which is known in the linear regression setting, is generalized to the kernel regression. More specifically, we prove that SGD with moderate and annealing step-size converges along the direction of the eigenvector that corresponds to the largest eigenvalue of the Gram matrix. In addition, the Gradient Descent (GD) with a moderate or small step-size converges along the direction that corresponds to the smallest eigenvalue. These facts are referred to as the directional bias properties; they may interpret how an SGD-computed estimator has a potentially smaller generalization error than a GD-computed estimator. The application of our theory is demonstrated by simulation studies and a case study that is based on the FashionMNIST dataset.