论文标题

QFT:通过所有自由度的快速关节填充进行培训量化

QFT: Post-training quantization via fast joint finetuning of all degrees of freedom

论文作者

Finkelstein, Alex, Fuchs, Ella, Tal, Idan, Grobman, Mark, Vosco, Niv, Meller, Eldad

论文摘要

训练后量化(PTQ)挑战将量化的神经网络准确度接近原始的量化引起了人们的关注,这引起了行业需求的极大关注。许多方法都强调优化特定的自由度(DOF),例如量化步长,预处理因子,偏置固定,通常在多步溶液中链接给其他人。在这里,我们以HW-感知方式重新思考网络参数化,以对所有量化DOF进行统一的分析,这是首次允许其关节端到端登录。我们的单步简单且可扩展的方法(称为量化感知的芬太尼(QFT))在速度和资源的PTQ约束内实现了4位权重量化结果。

The post-training quantization (PTQ) challenge of bringing quantized neural net accuracy close to original has drawn much attention driven by industry demand. Many of the methods emphasize optimization of a specific degree-of-freedom (DoF), such as quantization step size, preconditioning factors, bias fixing, often chained to others in multi-step solutions. Here we rethink quantized network parameterization in HW-aware fashion, towards a unified analysis of all quantization DoF, permitting for the first time their joint end-to-end finetuning. Our single-step simple and extendable method, dubbed quantization-aware finetuning (QFT), achieves 4-bit weight quantization results on-par with SoTA within PTQ constraints of speed and resource.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源