准同质神经网络的不对称最大边缘偏置

论文标题

准同质神经网络的不对称最大边缘偏置

The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

论文作者

Kunin, Daniel, Yamamura, Atsushi, Ma, Chao, Ganguli, Surya

论文摘要

在这项工作中，我们探讨了经过梯度流动的梯度损失并超过可分离性的准均匀神经网络的最大细边缘偏置。我们介绍了准同质模型的类别，该模型足以描述几乎所有具有均匀激活的神经网络，甚至具有偏见，残留连接和归一化层的神经网络，同时结构足以对其梯度动力学进行几何分析。使用此分析，我们将同质网络的最大修订偏差的现有结果推广到该较丰富的模型。我们发现，梯度流隐含于参数的子集，与同质模型的情况不同，所有参数均被平等处理。我们通过简单的例子证明了这种强烈的偏爱如何最大程度地减少不对称规范可以降低准同质模型的鲁棒性。另一方面，我们猜想这种规范最小化在可能的情况下丢弃了不必要的高阶参数，将模型减少为稀疏的参数化。最后，通过将我们的定理应用于具有归一化层的足够表达神经网络，我们揭示了神经崩溃的经验现象背后的普遍机制。

In this work, we explore the maximum-margin bias of quasi-homogeneous neural networks trained with gradient flow on an exponential loss and past a point of separability. We introduce the class of quasi-homogeneous models, which is expressive enough to describe nearly all neural networks with homogeneous activations, even those with biases, residual connections, and normalization layers, while structured enough to enable geometric analysis of its gradient dynamics. Using this analysis, we generalize the existing results of maximum-margin bias for homogeneous networks to this richer class of models. We find that gradient flow implicitly favors a subset of the parameters, unlike in the case of a homogeneous model where all parameters are treated equally. We demonstrate through simple examples how this strong favoritism toward minimizing an asymmetric norm can degrade the robustness of quasi-homogeneous models. On the other hand, we conjecture that this norm-minimization discards, when possible, unnecessary higher-order parameters, reducing the model to a sparser parameterization. Lastly, by applying our theorem to sufficiently expressive neural networks with normalization layers, we reveal a universal mechanism behind the empirical phenomenon of Neural Collapse.

下载PDF全文

下载文献需遵守相关版权规定

论文标题