论文标题
Quadapter:用于GPT-2量化的适配器
Quadapter: Adapter for GPT-2 Quantization
论文作者
论文摘要
由于激活导致大量量化误差的异常值,因此很难量化变压器语言模型(例如GPT-2)。为了适应错误,必须使用量化感知培训,这需要基于数据集和与原始模型相同的训练管道进行微调过程。但是,预验证的语言模型通常不会授予对数据集和培训管道的访问权限,从而迫使我们依靠任意的数据集进行微调。在这种情况下,据观察,量化感知训练将模型贴上微调数据。对于不适合量化的量化,我们引入了一个量化适配器(Quadapter),这是一组少数参数,这些参数被学会地通过缩放渠道来缩放频道来对激活进行量化量化。它使模型参数保持不变。通过将我们的方法应用于量化GPT-2的具有挑战性的任务,我们证明它有效地防止了过度拟合并改善量化性能。
Transformer language models such as GPT-2 are difficult to quantize because of outliers in activations leading to a large quantization error. To adapt to the error, one must use quantization-aware training, which entails a fine-tuning process based on the dataset and the training pipeline identical to those for the original model. Pretrained language models, however, often do not grant access to their datasets and training pipelines, forcing us to rely on arbitrary ones for fine-tuning. In that case, it is observed that quantization-aware training overfits the model to the fine-tuning data. For quantization without overfitting, we introduce a quantization adapter (Quadapter), a small set of parameters that are learned to make activations quantization-friendly by scaling them channel-wise. It keeps the model parameters unchanged. By applying our method to the challenging task of quantizing GPT-2, we demonstrate that it effectively prevents the overfitting and improves the quantization performance.