论文标题
通过保留梯度流,在训练前挑选获胜门票
Picking Winning Tickets Before Training by Preserving Gradient Flow
论文作者
论文摘要
已显示过度参数化有益于神经网络的优化和概括,但是大型网络在培训和测试时间既渴望了资源。网络修剪可以减少测试时间资源需求,但通常应用于训练有素的网络,因此无法避免昂贵的培训过程。我们的目标是修剪初始化,从而在培训时间节省资源。具体而言,我们认为有效的训练需要保留通过网络的梯度流。这导致了一个简单但有效的修剪标准,我们将梯度信号保存(GRASP)定为术语。我们使用VGGNET和RESNET体系结构进行了广泛的实验,从经验上研究了所提出的方法的有效性。我们的方法可以在初始化时修复Imagenet上VGG-16网络的重量的80%,而TOP-1的精度仅下降了1.6%。此外,我们的方法在极端的稀疏度水平下的基线表现明显好得多。
Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time. Network pruning can reduce test-time resource requirements, but is typically applied to trained networks and therefore cannot avoid the expensive training process. We aim to prune networks at initialization, thereby saving resources at training time as well. Specifically, we argue that efficient training requires preserving the gradient flow through the network. This leads to a simple but effective pruning criterion we term Gradient Signal Preservation (GraSP). We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet, using VGGNet and ResNet architectures. Our method can prune 80% of the weights of a VGG-16 network on ImageNet at initialization, with only a 1.6% drop in top-1 accuracy. Moreover, our method achieves significantly better performance than the baseline at extreme sparsity levels.