论文标题

PHEW:构建稀疏网络,这些网络在没有培训数据的情况下快速学习,概括得很好

PHEW: Constructing Sparse Networks that Learn Fast and Generalize Well without Training Data

论文作者

Patil, Shreyas Malakarjun, Dovrolis, Constantine

论文摘要

在实践中,稀疏网络的方法很重要,因为它们大大提高了学习和推理的效率。我们的工作基于最近提出的神经切线核(NTK)的分解,该分解将训练过程的动力学分解为数据依赖性组件和依赖建筑的核 - 后者称为路径内核。这项工作显示了如何使用Synflow-L2算法设计稀疏的神经网络,以使其更快地收敛,而无需任何培训数据。我们首先表明,即使Synflow-L2在收敛性方面是最佳的,对于给定的网络密度,它也会导致具有“瓶颈”(狭窄)层的子网络 - 与使用相同数量参数数的其他数据无关的方法相比,它导致性能差。然后,我们提出了一种新的方法来构建稀疏网络,而没有任何训练数据,即具有更高边缘权重的路径(PHEW)。 PHEW是一种基于仅取决于初始权重的有偏见的随机步道的概率网络形成方法。它具有与Synflow-L2相似的路径内核属性,但产生了许多较宽的层,从而获得更好的概括和性能。 PHEW比在广泛的网络密度范围内的数据独立的同步和同步L2方法方面取得了重大改进。

Methods that sparsify a network at initialization are important in practice because they greatly improve the efficiency of both learning and inference. Our work is based on a recently proposed decomposition of the Neural Tangent Kernel (NTK) that has decoupled the dynamics of the training process into a data-dependent component and an architecture-dependent kernel - the latter referred to as Path Kernel. That work has shown how to design sparse neural networks for faster convergence, without any training data, using the Synflow-L2 algorithm. We first show that even though Synflow-L2 is optimal in terms of convergence, for a given network density, it results in sub-networks with "bottleneck" (narrow) layers - leading to poor performance as compared to other data-agnostic methods that use the same number of parameters. Then we propose a new method to construct sparse networks, without any training data, referred to as Paths with Higher-Edge Weights (PHEW). PHEW is a probabilistic network formation method based on biased random walks that only depends on the initial weights. It has similar path kernel properties as Synflow-L2 but it generates much wider layers, resulting in better generalization and performance. PHEW achieves significant improvements over the data-independent SynFlow and SynFlow-L2 methods at a wide range of network densities.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源