论文标题
双线无限残留神经网络:扩散过程方法
Doubly infinite residual neural networks: a diffusion process approach
论文作者
论文摘要
现代神经网络(NN)具有大量层(深度)和每层单元(宽度)的功能,在许多领域都取得了出色的性能。尽管存在有关无限宽的NN和高斯过程之间相互作用的大量文献,但对于无限深度NNS的类似相互作用知之甚少。随着层数的增加,具有独立且相同分布(I.I.D.)初始化的独立和相同分布(I.I.D.)初始化的NN表现出不良的前向传播属性。为了克服这些缺点,Peluchetti和Favaro(2020)认为完全连接的残留网络(重置)与网络的参数初始化的参数是通过分布的初始参数,这些参数随着分层数量的增加而缩小,从而建立了无限深度的重新网络和与随机差异方程式的无限深层分辨率,并不适用于差异,不散布差异。前向传播属性。在本文中,我们回顾了Peluchetti和Favaro(2020)的结果,将它们扩展到卷积重新合网,并建立了类似的向后传播结果,这与训练完全连接的深层重新NET的问题直接相关。然后,我们调查了无限NNS的更通用的环境,其中网络的宽度和网络的深度都无法实现。我们专注于双一无限完全连接的重新结合,为此我们认为I.I.D.初始化。在这种情况下,我们表明,初始化时,兴趣量的动力学会融合到确定性的限制中。这使我们能够为推理提供分析表达式,无论是在训练有素较弱且训练有素的重置方面。我们的结果突出了当未量化网络的参数为i.i.d时,双线无限重置的表达能力有限。残留块很浅。
Modern neural networks (NN) featuring a large number of layers (depth) and units per layer (width) have achieved a remarkable performance across many domains. While there exists a vast literature on the interplay between infinitely wide NNs and Gaussian processes, a little is known about analogous interplays with respect to infinitely deep NNs. NNs with independent and identically distributed (i.i.d.) initializations exhibit undesirable forward and backward propagation properties as the number of layers increases. To overcome these drawbacks, Peluchetti and Favaro (2020) considered fully-connected residual networks (ResNets) with network's parameters initialized by means of distributions that shrink as the number of layers increases, thus establishing an interplay between infinitely deep ResNets and solutions to stochastic differential equations, i.e. diffusion processes, and showing that infinitely deep ResNets does not suffer from undesirable forward-propagation properties. In this paper, we review the results of Peluchetti and Favaro (2020), extending them to convolutional ResNets, and we establish analogous backward-propagation results, which directly relate to the problem of training fully-connected deep ResNets. Then, we investigate the more general setting of doubly infinite NNs, where both network's width and network's depth grow unboundedly. We focus on doubly infinite fully-connected ResNets, for which we consider i.i.d. initializations. Under this setting, we show that the dynamics of quantities of interest converge, at initialization, to deterministic limits. This allow us to provide analytical expressions for inference, both in the case of weakly trained and fully trained ResNets. Our results highlight a limited expressive power of doubly infinite ResNets when the unscaled network's parameters are i.i.d. and the residual blocks are shallow.