论文标题
用量化噪声训练极端模型压缩
Training with Quantization Noise for Extreme Model Compression
论文作者
论文摘要
我们解决了生产紧凑型模型的问题,从而最大程度地提高了它们的精度。标准解决方案是通过量化意识训练进行培训网络,在训练过程中对权重进行量化,并且梯度与直线估计器近似。在本文中,我们将这种方法扩展到超越INT8定位点量化的功能,并采用极端的压缩方法,其中Ste引入的近似值很严重,例如产品量化。我们的建议是在每个前进过程中仅量化不同的随机重量子集,从而使无偏的梯度流过其他权重。控制噪声量及其形式的数量允许在保持原始模型的性能的同时,可以实现极端的压缩率。结果,我们在自然语言处理和图像分类中建立了新的最先进的妥协和模型大小之间的折衷。例如,将我们的方法应用于最先进的变压器和交流体架构,我们可以通过将Roberta压缩到14MB和80.0 TOP-1的ImageNet上的MNLI精度为82.5%,并通过将有效网络B3压缩到3.3MB。
We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods where the approximations introduced by STE are severe, such as Product Quantization. Our proposal is to only quantize a different random subset of weights during each forward, allowing for unbiased gradients to flow through the other weights. Controlling the amount of noise and its form allows for extreme compression rates while maintaining the performance of the original model. As a result we establish new state-of-the-art compromises between accuracy and model size both in natural language processing and image classification. For example, applying our method to state-of-the-art Transformer and ConvNet architectures, we can achieve 82.5% accuracy on MNLI by compressing RoBERTa to 14MB and 80.0 top-1 accuracy on ImageNet by compressing an EfficientNet-B3 to 3.3MB.