梯度下降场景的幕后：通过基础函数分解的轨迹分析

论文标题

梯度下降场景的幕后：通过基础函数分解的轨迹分析

Behind the Scenes of Gradient Descent: A Trajectory Analysis via Basis Function Decomposition

论文作者

Ma, Jianhao, Guo, Lingjun, Fattahi, Salar

论文摘要

这项工作通过新的基础函数分解来分析基于梯度的算法的溶液轨迹。我们表明，尽管基于梯度的算法的解决方案轨迹可能会根据学习任务而有所不同，但当将它们投影到适当的正顺序函数基础上时，它们的行为几乎单调。这种投影产生了溶液轨迹的基础函数分解。从理论上讲，我们使用提出的基础函数分解来确定几个代表性学习任务的梯度下降（GD）的收敛性。特别是，我们改善了GD在对称矩阵分解上的收敛性，并为正交对称张量分解提供了全新的收敛结果。从经验上讲，我们说明了我们提出的框架对不同体系结构，基于梯度的求解器和数据集的现实深度神经网络（DNN）的承诺。我们的主要发现是，基于梯度的算法单调地学习了训练后定义为偶联核的特征向量的DNN的特定正顺序函数基础的系数。我们的代码可从https://github.com/jianhaoma/function-basis-decomposition获得。

This work analyzes the solution trajectory of gradient-based algorithms via a novel basis function decomposition. We show that, although solution trajectories of gradient-based algorithms may vary depending on the learning task, they behave almost monotonically when projected onto an appropriate orthonormal function basis. Such projection gives rise to a basis function decomposition of the solution trajectory. Theoretically, we use our proposed basis function decomposition to establish the convergence of gradient descent (GD) on several representative learning tasks. In particular, we improve the convergence of GD on symmetric matrix factorization and provide a completely new convergence result for the orthogonal symmetric tensor decomposition. Empirically, we illustrate the promise of our proposed framework on realistic deep neural networks (DNNs) across different architectures, gradient-based solvers, and datasets. Our key finding is that gradient-based algorithms monotonically learn the coefficients of a particular orthonormal function basis of DNNs defined as the eigenvectors of the conjugate kernel after training. Our code is available at https://github.com/jianhaoma/function-basis-decomposition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题