论文标题
压缩模型很少:模仿然后更换
Compressing Models with Few Samples: Mimicking then Replacing
论文作者
论文摘要
几个样本压缩旨在将大型冗余模型压缩到只有少量样品的小型紧凑型模型中。如果我们直接使用这些有限的少数样本微调模型,则模型将容易效果过高,几乎什么也没学。因此,以前的方法优化了逐层压缩模型,并尝试使每一层的输出与教师模型中的相应图层相同,这很麻烦。在本文中,我们提出了一个名为模仿的新框架,然后更换(miR)以进行几个样本压缩,该框架首先敦促修剪的模型输出与倒数第二层中老师的特征,然后在倒数第二个倒数第二层的层次上取代老师的层次,并用良好的紧凑型紧凑型。与以前的层重建方法不同,我们的MIR可以从整体上优化整个网络,这不仅是简单有效的,而且是无监督和一般的。 miR的表现优于以前的方法。代码将很快可用。
Few-sample compression aims to compress a big redundant model into a small compact one with only few samples. If we fine-tune models with these limited few samples directly, models will be vulnerable to overfit and learn almost nothing. Hence, previous methods optimize the compressed model layer-by-layer and try to make every layer have the same outputs as the corresponding layer in the teacher model, which is cumbersome. In this paper, we propose a new framework named Mimicking then Replacing (MiR) for few-sample compression, which firstly urges the pruned model to output the same features as the teacher's in the penultimate layer, and then replaces teacher's layers before penultimate with a well-tuned compact one. Unlike previous layer-wise reconstruction methods, our MiR optimizes the entire network holistically, which is not only simple and effective, but also unsupervised and general. MiR outperforms previous methods with large margins. Codes will be available soon.