黑白唱片和迪士贝托：轻巧的西班牙语模型

论文标题

黑白唱片和迪士贝托：轻巧的西班牙语模型

ALBETO and DistilBETO: Lightweight Spanish Language Models

论文作者

Cañete, José, Donoso, Sebastián, Bravo-Marquez, Felipe, Carvallo, Andrés, Araujo, Vladimir

论文摘要

近年来，预训练的语言模型已经取得了很大进步，在该模型中也提供了非英语语言版本。由于它们的使用越来越大，这些模型的许多轻巧版本（参数减少）也已发布，以加快训练和推理时间。但是，这些较轻的模型（例如，Albert，Distilbert）的英语以外的语言的版本仍然很少。在本文中，我们介绍了白色的Albeto和Distilbeto，它们是Albert和Distilbert的版本，专门在西班牙语料库上进行了预先培训。我们训练几个版本的白色唱片范围从5m到223m的参数和一个具有67m参数的Distilbeto。我们在胶水基准中评估我们的模型，其中包括西班牙语中各种自然语言理解任务。结果表明，尽管参数较少，但我们的轻量级模型仍可以为Beto（西班牙语）获得竞争成果。更具体地说，我们较大的白色模型的表现优于MLDOC，PAWS-X，XNLI，MLQA，SQAC和XQUAD数据集上的所有其他模型。但是，Beto对POS和NER仍然保持不败。作为进一步的贡献，所有模型均可公开向社区公开以进行未来的研究。

In recent years there have been considerable advances in pre-trained language models, where non-English language versions have also been made available. Due to their increasing use, many lightweight versions of these models (with reduced parameters) have also been released to speed up training and inference times. However, versions of these lighter models (e.g., ALBERT, DistilBERT) for languages other than English are still scarce. In this paper we present ALBETO and DistilBETO, which are versions of ALBERT and DistilBERT pre-trained exclusively on Spanish corpora. We train several versions of ALBETO ranging from 5M to 223M parameters and one of DistilBETO with 67M parameters. We evaluate our models in the GLUES benchmark that includes various natural language understanding tasks in Spanish. The results show that our lightweight models achieve competitive results to those of BETO (Spanish-BERT) despite having fewer parameters. More specifically, our larger ALBETO model outperforms all other models on the MLDoc, PAWS-X, XNLI, MLQA, SQAC and XQuAD datasets. However, BETO remains unbeaten for POS and NER. As a further contribution, all models are publicly available to the community for future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题