论文标题

哪些知识在知识蒸馏中被提炼?

What Knowledge Gets Distilled in Knowledge Distillation?

论文作者

Ojha, Utkarsh, Li, Yuheng, Rajan, Anirudh Sundara, Liang, Yingyu, Lee, Yong Jae

论文摘要

知识蒸馏旨在将有用的信息从教师网络转移到学生网络,其主要目的是改善学生在手头任务的绩效。多年来,有大量的新技术和知识蒸馏用例。然而,尽管有各种改进,但社区对该过程的基本理解似乎存在明显的差距。具体而言,在知识蒸馏中蒸馏的知识是什么?换句话说,学生以什么方式变得与老师相似?它是否开始以相同的方式定位对象?它被同样的对抗样本所欺骗吗?它的数据不变性属性是否变得相似?我们的工作提出了一项全面的研究,试图回答这些问题。我们表明,现有方法确实可以间接地提炼这些属性,而不是改善任务绩效。我们进一步研究了为什么知识蒸馏可能会以这种方式起作用,并表明我们的发现也具有实际的影响。

Knowledge distillation aims to transfer useful information from a teacher network to a student network, with the primary goal of improving the student's performance for the task at hand. Over the years, there has a been a deluge of novel techniques and use cases of knowledge distillation. Yet, despite the various improvements, there seems to be a glaring gap in the community's fundamental understanding of the process. Specifically, what is the knowledge that gets distilled in knowledge distillation? In other words, in what ways does the student become similar to the teacher? Does it start to localize objects in the same way? Does it get fooled by the same adversarial samples? Does its data invariance properties become similar? Our work presents a comprehensive study to try to answer these questions. We show that existing methods can indeed indirectly distill these properties beyond improving task performance. We further study why knowledge distillation might work this way, and show that our findings have practical implications as well.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源