论文标题

深层低矩阵分解中隐式正则化的动力学理论

A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization

论文作者

Cao, Jian, Qian, Chen, Huang, Yihui, Chen, Dicheng, Gao, Yuncheng, Dong, Jiyang, Guo, Di, Qu, Xiaobo

论文摘要

隐性正则化是解释神经网络的重要方法。最近的理论开始用深矩阵分解模型(DMF)来解释隐式正则化,并在优化过程中分析离散梯度动力学的轨迹。这些离散的梯度动力学相对较小,但不是无限的,因此与神经网络的实际实现非常吻合。当前,离散的梯度动力学分析已成功应用于浅网络,但遇到了深层网络的复杂计算困难。在这项工作中,我们介绍了另一种离散的梯度动力学方法来解释隐式正则化,即景观分析。它主要关注梯度区域,例如马鞍点和局部最小值。从理论上讲,我们在DMF中建立了鞍点逃逸(SPE)阶段与矩阵等级之间的连接。我们证明,对于Rank-R矩阵重建,DMF将收敛到SPE阶段后的二阶临界点。在低级矩阵重建问题上进一步验证了这一结论。这项工作提供了一种新的理论,可以分析深度学习中的隐式正则化。

Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源