论文标题
蹲:通过对角线黑森西亚近似的即时无数据量化
SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation
论文作者
论文摘要
深度神经网络(DNN)的量化已被证明可有效地压缩和加速DNN模型。无数据量化(DFQ)是一种有前途的方法,而没有对隐私敏感和机密方案的原始数据集。但是,当前的DFQ解决方案降低精度,需要合成数据来校准网络,并且耗时且昂贵。本文提出了一个以下量化时间(称为squant)的即时DFQ框架,该框架可以量化仅计算和内存要求较低的仅推理设备上的网络。通过对DNN任务损失的二阶信息的理论分析,我们将基于Hessian的优化目标分解为三个对角线子项目,它们具有不同的区域,这些区域与重量张量的三个维度相对应:元素,元素,内核智能和输出通道。然后,我们逐步构成子项目,并在离散域中提出了一个新颖的无数据优化目标,最大程度地减少了约束的绝对误差总和(或简而言之),令人惊讶的是,该目标不需要任何数据集,甚至不知道网络体系结构。我们还设计了一种有效的算法,而无需向后传播,以进一步降低客观求解器的计算复杂性。最后,在没有微调和合成数据集的情况下,挤压无数据的量化过程将无数据的量化过程加速到下一级级别,与现有的无数据训练后量化作品相比,精度> 30%的精度提高,而评估的模型在4位量化下进行了评估。我们已经在https://github.com/clevercool/squant开源了蹲框架。
Quantization of deep neural networks (DNN) has been proven effective for compressing and accelerating DNN models. Data-free quantization (DFQ) is a promising approach without the original datasets under privacy-sensitive and confidential scenarios. However, current DFQ solutions degrade accuracy, need synthetic data to calibrate networks, and are time-consuming and costly. This paper proposes an on-the-fly DFQ framework with sub-second quantization time, called SQuant, which can quantize networks on inference-only devices with low computation and memory requirements. With the theoretical analysis of the second-order information of DNN task loss, we decompose and approximate the Hessian-based optimization objective into three diagonal sub-items, which have different areas corresponding to three dimensions of weight tensor: element-wise, kernel-wise, and output channel-wise. Then, we progressively compose sub-items and propose a novel data-free optimization objective in the discrete domain, minimizing Constrained Absolute Sum of Error (or CASE in short), which surprisingly does not need any dataset and is even not aware of network architecture. We also design an efficient algorithm without back-propagation to further reduce the computation complexity of the objective solver. Finally, without fine-tuning and synthetic datasets, SQuant accelerates the data-free quantization process to a sub-second level with >30% accuracy improvement over the existing data-free post-training quantization works, with the evaluated models under 4-bit quantization. We have open-sourced the SQuant framework at https://github.com/clevercool/SQuant.