在多语言模型上，压缩的有趣特性

论文标题

在多语言模型上，压缩的有趣特性

Intriguing Properties of Compression on Multilingual Models

论文作者

Ogueji, Kelechi, Ahia, Orevaoghene, Onilude, Gbemileke, Gehrmann, Sebastian, Hooker, Sara, Kreutzer, Julia

论文摘要

多语言模型通常特别取决于扩展到越来越多的语言的规模。压缩技术被广泛依赖于将模型大小的增长与现实资源限制调和，但压缩可能会对低资源语言的模型性能产生不同的影响。因此，了解规模，多语言和压缩之间的权衡至关重要。在这项工作中，我们提出了一个实验框架，以表征在微调过程中稀疏多语言预训练的语言模型的影响。将此框架应用于跨40种语言的Mbert指定实体识别模型，我们发现压缩赋予了几种有趣且以前未知的概括属性。与先前的发现相反，我们发现压缩可以改善密集模型的模型鲁棒性。我们还观察到，在某些稀疏方案下，压缩可能有助于，而不是不成比例地影响低资源语言的性能。

Multilingual models are often particularly dependent on scaling to generalize to a growing number of languages. Compression techniques are widely relied upon to reconcile the growth in model size with real world resource constraints, but compression can have a disparate effect on model performance for low-resource languages. It is thus crucial to understand the trade-offs between scale, multilingualism, and compression. In this work, we propose an experimental framework to characterize the impact of sparsifying multilingual pre-trained language models during fine-tuning. Applying this framework to mBERT named entity recognition models across 40 languages, we find that compression confers several intriguing and previously unknown generalization properties. In contrast to prior findings, we find that compression may improve model robustness over dense models. We additionally observe that under certain sparsification regimes compression may aid, rather than disproportionately impact the performance of low-resource languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题