Ternarybert：蒸馏意见超低位伯特

论文标题

Ternarybert：蒸馏意见超低位伯特

TernaryBERT: Distillation-aware Ultra-low Bit BERT

论文作者

Zhang, Wei, Hou, Lu, Yin, Yichun, Shang, Lifeng, Chen, Xiao, Jiang, Xin, Liu, Qun

论文摘要

基于变压器的预训练模型（例如BERT）在许多自然语言处理任务中都取得了出色的性能。但是，这些模型既计算又昂贵，从而阻碍了其部署到资源受限的设备上。在这项工作中，我们提出了Ternarybert，该ternarybert在微调的BERT模型中将权重缩小。具体而言，我们使用基于近似和损失感知的三元化方法，并在经验上研究BERT不同部分的三粒化粒度。此外，为了降低低位容量降低引起的准确性降解，我们利用训练过程中的知识蒸馏技术。胶水基准和小队的实验表明，我们提出的Ternarybert优于其他BERT量化方法，甚至可以达到与完整模型相当的性能，而较小14.9倍。

Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks.However, these models are both computation and memory expensive, hindering their deployment to resource-constrained devices. In this work, we propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model. Specifically, we use both approximation-based and loss-aware ternarization methods and empirically investigate the ternarization granularity of different parts of BERT. Moreover, to reduce the accuracy degradation caused by the lower capacity of low bits, we leverage the knowledge distillation technique in the training process. Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods, and even achieves comparable performance as the full-precision model while being 14.9x smaller.

下载PDF全文

下载文献需遵守相关版权规定

论文标题