论文标题
使用人群计数的审查机制来减少知识蒸馏的能力差距
Reducing Capacity Gap in Knowledge Distillation with Review Mechanism for Crowd Counting
论文作者
论文摘要
轻巧的人群计数模型,尤其是基于知识蒸馏(KD)的模型,由于其对计算效率和硬件要求的优势,近年来引起了人们的关注。但是,现有的基于KD的模型通常会遇到容量差距问题,从而导致学生网络的性能受到教师网络的限制。在本文中,我们通过在研究过程中引入了人类养生机制的审查机制来解决此问题。因此,提出的模型被称为ReviewKD。所提出的模型由指导阶段和审查阶段组成,在该阶段,我们首先利用训练有素的重型教师网络将其潜在特征转移到教学阶段的轻量级学生网络中,然后在审查阶段,通过审查机制,基于学习的功能,基于学习的功能得出了密度图的精致估计。与最先进的模型相比,通过六个基准数据集的一组实验证明了评论KD的有效性。数值结果表明,ReviewKD的表现优于现有的轻量级模型用于人群计数,并且可以有效地减轻容量差距的问题,尤其是在教师网络之外的表现。除了轻巧的模型外,我们还表明,建议的审查机制可以用作插件模块,以进一步提高一种大量人群计数模型的性能,而无需修改神经网络体系结构并引入任何其他模型参数。
The lightweight crowd counting models, in particular knowledge distillation (KD) based models, have attracted rising attention in recent years due to their superiority on computational efficiency and hardware requirement. However, existing KD based models usually suffer from the capacity gap issue, resulting in the performance of the student network being limited by the teacher network. In this paper, we address this issue by introducing a novel review mechanism following KD models, motivated by the review mechanism of human-beings during the study. Thus, the proposed model is dubbed ReviewKD. The proposed model consists of an instruction phase and a review phase, where we firstly exploit a well-trained heavy teacher network to transfer its latent feature to a lightweight student network in the instruction phase, then in the review phase yield a refined estimate of the density map based on the learned feature through a review mechanism. The effectiveness of ReviewKD is demonstrated by a set of experiments over six benchmark datasets via comparing to the state-of-the-art models. Numerical results show that ReviewKD outperforms existing lightweight models for crowd counting, and can effectively alleviate the capacity gap issue, and particularly has the performance beyond the teacher network. Besides the lightweight models, we also show that the suggested review mechanism can be used as a plug-and-play module to further boost the performance of a kind of heavy crowd counting models without modifying the neural network architecture and introducing any additional model parameter.