论文标题

深度残留网络的梯度下降的收敛和隐式正则化特性

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

论文作者

Cont, Rama, Rossier, Alain, Xu, RenYuan

论文摘要

我们证明,梯度下降到全局最佳的线性收敛,以训练具有恒定层宽度和平滑激活函数的深残留网络。我们表明,如果受过训练的权重作为图层索引的函数,则承认缩放限制随着深度的增加,则极限具有有限的$ p-$变化,$ p = 2 $。证明是基于损失函数的非反应估计以及沿梯度下降路径的网络权重的规范。我们使用有关监督学习问题的详细数值实验来说明我们的理论结果与实际设置的相关性。

We prove linear convergence of gradient descent to a global optimum for the training of deep residual networks with constant layer width and smooth activation function. We show that if the trained weights, as a function of the layer index, admit a scaling limit as the depth increases, then the limit has finite $p-$variation with $p=2$. Proofs are based on non-asymptotic estimates for the loss function and for norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源