Camero：一致性的扰动语言模型的一致性合奏，重量共享

论文标题

Camero：一致性的扰动语言模型的一致性合奏，重量共享

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

论文作者

Liang, Chen, He, Pengcheng, Shen, Yelong, Chen, Weizhu, Zhao, Tuo

论文摘要

模型合奏是一种流行的方法，可以产生低变化和良好的模型。但是，它引起了巨大的记忆和推理成本，而实际部署通常不起作用。现有的工作已诉诸于模型之间的权重。但是，当增加共享权重的比例时，所得模型往往相似，并且使用模型集合的好处会减少。为了在保持低内存成本的同时保留集合福利，我们提出了一种基于扰动模型Camero的一致性调节的集合学习方法。具体而言，我们共享所有模型中底层的权重，并将不同的扰动应用于不同模型的隐藏表示形式，这可以有效地促进模型多样性。同时，我们在扰动模型中应用预测一致性正常化程序来控制模型多样性所致的差异。我们使用大语言模型的实验表明，卡梅罗显着提高了集合模型的概括性能。具体而言，Camero的表现优于胶水基准上8个BERT基本模型的标准集合，其模型尺寸明显较小（114.2m vs. 880.6m）。

Model ensemble is a popular approach to produce a low-variance and well-generalized model. However, it induces large memory and inference costs, which are often not affordable for real-world deployment. Existing work has resorted to sharing weights among models. However, when increasing the proportion of the shared weights, the resulting models tend to be similar, and the benefits of using model ensemble diminish. To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO. Specifically, we share the weights of bottom layers across all models and apply different perturbations to the hidden representations for different models, which can effectively promote the model diversity. Meanwhile, we apply a prediction consistency regularizer across the perturbed models to control the variance due to the model diversity. Our experiments using large language models demonstrate that CAMERO significantly improves the generalization performance of the ensemble model. Specifically, CAMERO outperforms the standard ensemble of 8 BERT-base models on the GLUE benchmark by 0.7 with a significantly smaller model size (114.2M vs. 880.6M).

下载PDF全文

下载文献需遵守相关版权规定

论文标题