除了深层网络的随机矩阵理论之外

论文标题

除了深层网络的随机矩阵理论之外

Beyond Random Matrix Theory for Deep Networks

论文作者

Granziol, Diego

论文摘要

我们研究了Wigner半圆形和Marcenko-Pastur分布是否用于深度神经网络理论分析，均与经验观察到的光谱密度相匹配。我们发现，即使允许异常值，观察到的光谱形状也与这种理论预测有着强烈的偏差。这引发了有关这些模型在深度学习中的实用性的主要问题。我们进一步表明，理论结果（例如临界点的分层性质）在很大程度上取决于这些极限光谱密度的确切形式的使用。我们考虑两种新的矩阵合奏；随机的Wigner/Wishart集合产品和渗透的Wigner/Wishart合奏，这两者都可以更好地匹配观察到的光谱。它们还在原点给出了较大的离散光谱峰，为观察到的观察提供了一种理论解释，即各种Optima可以通过低损耗值的一维连接。我们进一步表明，在随机矩阵乘积的情况下，离散频谱分量（$ 0 $）的重量取决于权重矩阵尺寸的比率。

We investigate whether the Wigner semi-circle and Marcenko-Pastur distributions, often used for deep neural network theoretical analysis, match empirically observed spectral densities. We find that even allowing for outliers, the observed spectral shapes strongly deviate from such theoretical predictions. This raises major questions about the usefulness of these models in deep learning. We further show that theoretical results, such as the layered nature of critical points, are strongly dependent on the use of the exact form of these limiting spectral densities. We consider two new classes of matrix ensembles; random Wigner/Wishart ensemble products and percolated Wigner/Wishart ensembles, both of which better match observed spectra. They also give large discrete spectral peaks at the origin, providing a theoretical explanation for the observation that various optima can be connected by one dimensional of low loss values. We further show that, in the case of a random matrix product, the weight of the discrete spectral component at $0$ depends on the ratio of the dimensions of the weight matrices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题