QFT：通过所有自由度的快速关节填充进行培训量化

论文标题

QFT：通过所有自由度的快速关节填充进行培训量化

QFT: Post-training quantization via fast joint finetuning of all degrees of freedom

论文作者

Finkelstein, Alex, Fuchs, Ella, Tal, Idan, Grobman, Mark, Vosco, Niv, Meller, Eldad

论文摘要

训练后量化（PTQ）挑战将量化的神经网络准确度接近原始的量化引起了人们的关注，这引起了行业需求的极大关注。许多方法都强调优化特定的自由度（DOF），例如量化步长，预处理因子，偏置固定，通常在多步溶液中链接给其他人。在这里，我们以HW-感知方式重新思考网络参数化，以对所有量化DOF进行统一的分析，这是首次允许其关节端到端登录。我们的单步简单且可扩展的方法（称为量化感知的芬太尼（QFT））在速度和资源的PTQ约束内实现了4位权重量化结果。

The post-training quantization (PTQ) challenge of bringing quantized neural net accuracy close to original has drawn much attention driven by industry demand. Many of the methods emphasize optimization of a specific degree-of-freedom (DoF), such as quantization step size, preconditioning factors, bias fixing, often chained to others in multi-step solutions. Here we rethink quantized network parameterization in HW-aware fashion, towards a unified analysis of all quantization DoF, permitting for the first time their joint end-to-end finetuning. Our single-step simple and extendable method, dubbed quantization-aware finetuning (QFT), achieves 4-bit weight quantization results on-par with SoTA within PTQ constraints of speed and resource.

下载PDF全文

下载文献需遵守相关版权规定

论文标题