论文标题
提示动态知识蒸馏
Hint-dynamic Knowledge Distillation
论文作者
论文摘要
知识蒸馏(KD)将知识从高容量的教师模型转移到较小的学生模型中。现有的努力通过匹配其预测逻辑,功能嵌入等来指导蒸馏,同时留下如何有效利用它们在交界处的探索较少探索。在本文中,我们提出了称为HKD的动态知识蒸馏,该蒸馏被称为HKD,它从教师的动态方案中的暗示中挖掘出知识。知识提示的指导效果通常在不同的情况和学习阶段各不相同,这激发了我们为每个实例定制特定的提示学习方式。具体而言,引入了一个元重量网络,以生成有关知识的实例权重系数,这在感知学生模型的动态学习进度时提示。我们进一步提出了一种结合策略,以通过利用历史静态来消除系数估计的潜在偏见。对CIFAR-100和Tiny-Imagenet标准基准测试的实验表明,所提出的HKD很好地增强了知识蒸馏任务的影响。
Knowledge Distillation (KD) transfers the knowledge from a high-capacity teacher model to promote a smaller student model. Existing efforts guide the distillation by matching their prediction logits, feature embedding, etc., while leaving how to efficiently utilize them in junction less explored. In this paper, we propose Hint-dynamic Knowledge Distillation, dubbed HKD, which excavates the knowledge from the teacher' s hints in a dynamic scheme. The guidance effect from the knowledge hints usually varies in different instances and learning stages, which motivates us to customize a specific hint-learning manner for each instance adaptively. Specifically, a meta-weight network is introduced to generate the instance-wise weight coefficients about knowledge hints in the perception of the dynamical learning progress of the student model. We further present a weight ensembling strategy to eliminate the potential bias of coefficient estimation by exploiting the historical statics. Experiments on standard benchmarks of CIFAR-100 and Tiny-ImageNet manifest that the proposed HKD well boost the effect of knowledge distillation tasks.