深度残留网络的梯度下降的收敛和隐式正则化特性

论文标题

深度残留网络的梯度下降的收敛和隐式正则化特性

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

论文作者

Cont, Rama, Rossier, Alain, Xu, RenYuan

论文摘要

我们证明，梯度下降到全局最佳的线性收敛，以训练具有恒定层宽度和平滑激活函数的深残留网络。我们表明，如果受过训练的权重作为图层索引的函数，则承认缩放限制随着深度的增加，则极限具有有限的$ p-$变化，$ p = 2 $。证明是基于损失函数的非反应估计以及沿梯度下降路径的网络权重的规范。我们使用有关监督学习问题的详细数值实验来说明我们的理论结果与实际设置的相关性。

We prove linear convergence of gradient descent to a global optimum for the training of deep residual networks with constant layer width and smooth activation function. We show that if the trained weights, as a function of the layer index, admit a scaling limit as the depth increases, then the limit has finite $p-$variation with $p=2$. Proofs are based on non-asymptotic estimates for the loss function and for norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题