熵引起的卷积神经网络的修剪框架

论文标题

熵引起的卷积神经网络的修剪框架

Entropy Induced Pruning Framework for Convolutional Neural Networks

论文作者

Lu, Yiheng, Guan, Ziyu, Yang, Yaming, Gong, Maoguo, Zhao, Wei, Feng, Kaiyuan

论文摘要

结构化的修剪技术在用于图像分类任务的卷积神经网络上取得了出色的压缩性能。但是，大多数现有方法都是面向重量的，当原始模型的训练较差时，它们的修剪结果可能不令人满意。也就是说，需要一个完全训练的模型来提供有用的权重信息。这可能很耗时，并且修剪结果对模型参数的更新过程敏感。在本文中，我们提出了一个名为“平均过滤器信息熵（AFIE）”的度量，以测量每个滤镜的重要性。它是由三个主要步骤计算得出的，即每个卷积层的“输入输出”矩阵的低排放分解，所获得的特征值的归一化以及基于信息熵的滤波器重要性计算。通过利用拟议的AFIE，建议的框架能够对每个过滤器进行稳定的重要性评估，无论是否对原始模型进行了充分的培训。我们基于Alexnet，VGG-16和Resnet-50实施AFIE，并分别对MNIST，CIFAR-10和Imagenet进行测试。实验结果令人鼓舞。我们出乎意料地观察到，对于我们的方法，即使原始模型仅经过一个时期训练，当模型经过全面训练时，对每个滤镜的重要性评估都与结果相同。这表明所提出的修剪策略可以在原始模型的训练过程的开始阶段有效地执行。

Structured pruning techniques have achieved great compression performance on convolutional neural networks for image classification task. However, the majority of existing methods are weight-oriented, and their pruning results may be unsatisfactory when the original model is trained poorly. That is, a fully-trained model is required to provide useful weight information. This may be time-consuming, and the pruning results are sensitive to the updating process of model parameters. In this paper, we propose a metric named Average Filter Information Entropy (AFIE) to measure the importance of each filter. It is calculated by three major steps, i.e., low-rank decomposition of the "input-output" matrix of each convolutional layer, normalization of the obtained eigenvalues, and calculation of filter importance based on information entropy. By leveraging the proposed AFIE, the proposed framework is able to yield a stable importance evaluation of each filter no matter whether the original model is trained fully. We implement our AFIE based on AlexNet, VGG-16, and ResNet-50, and test them on MNIST, CIFAR-10, and ImageNet, respectively. The experimental results are encouraging. We surprisingly observe that for our methods, even when the original model is only trained with one epoch, the importance evaluation of each filter keeps identical to the results when the model is fully-trained. This indicates that the proposed pruning strategy can perform effectively at the beginning stage of the training process for the original model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题