仔细观察硬件友好的重量量化

论文标题

仔细观察硬件友好的重量量化

A Closer Look at Hardware-Friendly Weight Quantization

论文作者

Bae, Sungmin, Zielinski, Piotr, Chatterjee, Satrajit

论文摘要

量化具有高效的定点硬件实现的自定义加速器上的深神经网络（DNN）模型，需要满足许多严格的硬件友好量化约束来训练该模型。我们在权重量化的背景下评估了对硬件友好量化方法的两种主要类别：基于传统的平方量化误差（MSQE）的方法和最新的基于梯度的方法。我们使用多个经验指标研究了Mobilenetv1和Mobilenetv2上的两种方法，以识别这两个类别之间的性能差异来源，即对异常值的敏感性和量化器缩放系数的收敛不稳定。使用这些见解，我们提出了各种技术来提高两种量化方法的性能 - 它们在量化Mobilenet模型期间基于MSQE方法中存在的优化不稳定性问题，并使我们能够分别在ImabeNet上分别将基于梯度的方法的验证性能提高4.0％和3.3％。

Quantizing a Deep Neural Network (DNN) model to be used on a custom accelerator with efficient fixed-point hardware implementations, requires satisfying many stringent hardware-friendly quantization constraints to train the model. We evaluate the two main classes of hardware-friendly quantization methods in the context of weight quantization: the traditional Mean Squared Quantization Error (MSQE)-based methods and the more recent gradient-based methods. We study the two methods on MobileNetV1 and MobileNetV2 using multiple empirical metrics to identify the sources of performance differences between the two classes, namely, sensitivity to outliers and convergence instability of the quantizer scaling factor. Using those insights, we propose various techniques to improve the performance of both quantization methods - they fix the optimization instability issues present in the MSQE-based methods during quantization of MobileNet models and allow us to improve validation performance of the gradient-based methods by 4.0% and 3.3% for MobileNetV1 and MobileNetV2 on ImageNet respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题