论文标题
使用层次自我缩减的级联频道修剪
Cascaded channel pruning using hierarchical self-distillation
论文作者
论文摘要
在本文中,我们提出了一种基于教师,教学和学生框架的层次知识蒸馏的过滤级修剪方法。我们的方法利用了与目标学生相同的体系结构和权重的中级修剪水平的助教。我们建议使用其相应教师的梯度信息独立修剪每个模型。通过考虑每个学生教师对的相对尺寸,该公式在知识蒸馏的容量差距与滤波器显着性更新的偏见之间提供了自然的权衡。我们的结果表明,使用VGG16和RESNET50体系结构进行了CIFAR10和Imagenet分类任务的可达到的准确性和模型压缩的改善。我们提供了广泛的评估,该评估证明了使用不同尺寸的不同助教模型的好处。
In this paper, we propose an approach for filter-level pruning with hierarchical knowledge distillation based on the teacher, teaching-assistant, and student framework. Our method makes use of teaching assistants at intermediate pruning levels that share the same architecture and weights as the target student. We propose to prune each model independently using the gradient information from its corresponding teacher. By considering the relative sizes of each student-teacher pair, this formulation provides a natural trade-off between the capacity gap for knowledge distillation and the bias of the filter saliency updates. Our results show improvements in the attainable accuracy and model compression across the CIFAR10 and ImageNet classification tasks using the VGG16and ResNet50 architectures. We provide an extensive evaluation that demonstrates the benefits of using a varying number of teaching assistant models at different sizes.