论文标题
Fracbits:通过分数位宽度的混合精度量化
FracBits: Mixed Precision Quantization via Fractional Bit-Widths
论文作者
论文摘要
模型量化有助于减少深神网络的模型大小和潜伏期。混合精度量化是有利的,具有自定义的硬件,以多个位宽度的算术操作支持算术操作,以实现最大的效率。我们提出了一种新颖的基于学习的算法,以在目标计算约束和模型大小下端到端得出混合精度模型。在优化期间,模型中每个层 /内核的位宽度为两个连续的位宽度的分数状态,可以逐渐调整。有了一个可区分的正则化项,可以在量化感知训练期间满足资源约束,从而导致优化的混合精度模型。此外,我们的方法可以自然与通道修剪相结合,以获得更好的计算成本分配。我们的最终模型比以前的量化方法具有可比性或更好的性能,在ImabiLeenetv1/V2,Resnet18上具有混合精度在Imagenet数据集的不同资源约束下。
Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We propose a novel learning-based algorithm to derive mixed precision models end-to-end under target computation constraints and model sizes. During the optimization, the bit-width of each layer / kernel in the model is at a fractional status of two consecutive bit-widths which can be adjusted gradually. With a differentiable regularization term, the resource constraints can be met during the quantization-aware training which results in an optimized mixed precision model. Further, our method can be naturally combined with channel pruning for better computation cost allocation. Our final models achieve comparable or better performance than previous quantization methods with mixed precision on MobilenetV1/V2, ResNet18 under different resource constraints on ImageNet dataset.