论文标题
成为您自己最好的竞争对手!多分支对抗知识转移
Be Your Own Best Competitor! Multi-Branched Adversarial Knowledge Transfer
论文作者
论文摘要
深度神经网络架构在理解任务的场景中取得了显着改进。利用高效模型是限量资源设备的最重要限制之一。最近,已经提出了几种压缩方法,以减轻沉重的计算负担和记忆消耗。其中,修剪和量化方法通过压缩模型参数表现出表演的临界下降。尽管知识蒸馏方法通过在繁琐的网络的监督下专注于训练轻型网络,从而改善了紧凑型模型的性能。在提出的方法中,通过在模型的主要流(称为自distillation方法)上构造多个分支,在网络中进行了知识蒸馏。因此,已经提出了次神经网络模型的合奏,以通过知识蒸馏政策以及对抗性学习策略来转移知识。因此,针对对手的歧视模型训练了拟议的子模型合奏。此外,他们的知识是通过四个不同的损失函数在集合中传递的。所提出的方法已致力于轻巧的图像分类和编码器架构,以提高小型和紧凑的模型的性能,而不会在推理过程中引起额外的计算开销。主要挑战性数据集的广泛实验结果表明,在相同数量的参数和计算成本方面,所提出的网络在准确性方面优于主要模型。获得的结果表明,所提出的模型已经对早期自我鉴定方法的思想取得了重大改进。所提出模型的有效性也已在编码器模型中进行了说明。
Deep neural network architectures have attained remarkable improvements in scene understanding tasks. Utilizing an efficient model is one of the most important constraints for limited-resource devices. Recently, several compression methods have been proposed to diminish the heavy computational burden and memory consumption. Among them, the pruning and quantizing methods exhibit a critical drop in performances by compressing the model parameters. While the knowledge distillation methods improve the performance of compact models by focusing on training lightweight networks with the supervision of cumbersome networks. In the proposed method, the knowledge distillation has been performed within the network by constructing multiple branches over the primary stream of the model, known as the self-distillation method. Therefore, the ensemble of sub-neural network models has been proposed to transfer the knowledge among themselves with the knowledge distillation policies as well as an adversarial learning strategy. Hence, The proposed ensemble of sub-models is trained against a discriminator model adversarially. Besides, their knowledge is transferred within the ensemble by four different loss functions. The proposed method has been devoted to both lightweight image classification and encoder-decoder architectures to boost the performance of small and compact models without incurring extra computational overhead at the inference process. Extensive experimental results on the main challenging datasets show that the proposed network outperforms the primary model in terms of accuracy at the same number of parameters and computational cost. The obtained results show that the proposed model has achieved significant improvement over earlier ideas of self-distillation methods. The effectiveness of the proposed models has also been illustrated in the encoder-decoder model.