通过堆叠低维二元卷积过滤器来压缩深卷积神经网络

论文标题

通过堆叠低维二元卷积过滤器来压缩深卷积神经网络

Compressing Deep Convolutional Neural Networks by Stacking Low-dimensional Binary Convolution Filters

论文作者

Lan, Weichao, Lan, Liang

论文摘要

深度卷积神经网络（CNN）已成功应用于许多现实生活中。但是，Deep CNN模型的巨大内存成本构成了将它们部署在内存约束设备（例如手机）上的巨大挑战。降低CNN模型的存储成本的一种流行方法是训练二进制CNN，其中卷积过滤器中的权重为1或-1，因此可以使用一个位有效地存储每个权重。但是，现有二进制CNN模型的压缩比在32左右的上限。为了解决此限制，我们提出了一种新的方法来通过堆叠低维二进制卷积过滤器来压缩深CNN模型。我们提出的方法通过从一组低维二元卷积过滤器中选择和堆叠过滤器来近似标准卷积过滤器。这组低维二元卷积过滤器在所有过滤器中共享给定卷积层。因此，我们的方法将获得比二进制CNN模型更大的压缩比。为了训练我们所提出的模型，我们从理论上表明我们所提出的模型等效于选择和堆叠由低维二进制滤波器生成的中间特征图。因此，我们提出的模型可以使用拆分转换策略进行有效培训。我们还提供了模型推理中模型的内存和计算成本的详细分析。我们将提出的方法与两个基准数据集上的其他五种流行模型压缩技术进行了比较。我们的实验结果表明，我们所提出的方法的压缩比比现有方法高得多，同时保持了可比的精度。

Deep Convolutional Neural Networks (CNN) have been successfully applied to many real-life problems. However, the huge memory cost of deep CNN models poses a great challenge of deploying them on memory-constrained devices (e.g., mobile phones). One popular way to reduce the memory cost of deep CNN model is to train binary CNN where the weights in convolution filters are either 1 or -1 and therefore each weight can be efficiently stored using a single bit. However, the compression ratio of existing binary CNN models is upper bounded by around 32. To address this limitation, we propose a novel method to compress deep CNN model by stacking low-dimensional binary convolution filters. Our proposed method approximates a standard convolution filter by selecting and stacking filters from a set of low-dimensional binary convolution filters. This set of low-dimensional binary convolution filters is shared across all filters for a given convolution layer. Therefore, our method will achieve much larger compression ratio than binary CNN models. In order to train our proposed model, we have theoretically shown that our proposed model is equivalent to select and stack intermediate feature maps generated by low-dimensional binary filters. Therefore, our proposed model can be efficiently trained using the split-transform-merge strategy. We also provide detailed analysis of the memory and computation cost of our model in model inference. We compared the proposed method with other five popular model compression techniques on two benchmark datasets. Our experimental results have demonstrated that our proposed method achieves much higher compression ratio than existing methods while maintains comparable accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题