论文标题
RESREP:通过解耦记忆和忘记,无损CNN修剪
ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting
论文作者
论文摘要
我们提出了RESREP,这是一种用于无损通道修剪的新方法(又称滤波器修剪),该方法通过减少卷积层的宽度(输出通道数),从而降低了CNN。受到关于记住和遗忘的独立性的神经生物学研究的启发,我们建议将CNN重新分配到记忆部分并忘记部分,而前者学会保持表现,后者学会修剪。通过对以前的常规SGD进行培训,但新的更新规则具有后者的惩罚梯度,我们意识到了结构性的稀疏性。然后,我们等同地将记忆和忘记的部分与较窄的层次合并到原始体系结构中。从这个意义上讲,RESREP可以被视为结构重新分析的成功应用。这样的方法将重新释放与传统的基于学习的修剪范式区分开来,该范式对参数施加了惩罚以产生稀疏性,这可能会抑制对记忆必不可少的参数。将ImageNet精度为76.15%的标准Resnet-50缩小为较窄的Resnet-50,只有45%的失败而没有精度下降,这是首次以如此高的压缩率实现无损修剪的较狭窄。代码和模型位于https://github.com/dingxiaoh/resrep上。
We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter pruning), which slims down a CNN by reducing the width (number of output channels) of convolutional layers. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re-parameterize a CNN into the remembering parts and forgetting parts, where the former learn to maintain the performance and the latter learn to prune. Via training with regular SGD on the former but a novel update rule with penalty gradients on the latter, we realize structured sparsity. Then we equivalently merge the remembering and forgetting parts into the original architecture with narrower layers. In this sense, ResRep can be viewed as a successful application of Structural Re-parameterization. Such a methodology distinguishes ResRep from the traditional learning-based pruning paradigm that applies a penalty on parameters to produce sparsity, which may suppress the parameters essential for the remembering. ResRep slims down a standard ResNet-50 with 76.15% accuracy on ImageNet to a narrower one with only 45% FLOPs and no accuracy drop, which is the first to achieve lossless pruning with such a high compression ratio. The code and models are at https://github.com/DingXiaoH/ResRep.