论文标题

进行调整:提高较小语言模型的零拍学习能力

Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language Models

论文作者

Xu, Jingjing, Dong, Qingxiu, Liu, Hongyi, Li, Lei

论文摘要

随着规模的增加,大型语言模型既展示了定量的改进和新的定性功能,尤其是零摄像的学习者,例如GPT-3。但是,这些结果在很大程度上取决于精致的及时设计和大型计算。在这项工作中,我们探讨了在没有任何外部监督数据的情况下,是否可以在较小的模型刻度上实现强大的零击能力。为了实现这一目标,我们通过获取少量任务感知的自我监督数据来进一步更新语言模型,从而重新审视了掩盖语言建模,并提出了几何学引导的自我监督学习方法(简称为简单)。实验表明,与大语言模型(例如T5-XL(3B))相比,Go-Tuning可以实现T5-S-MALL(80M)竞争零击结果。我们还对多任务设置进行了调整,并开发了多任务型号MGO-T5(250m)。它可以在9个数据集上达到OPT(175B)的平均性能。

With increasing scale, large language models demonstrate both quantitative improvement and new qualitative capabilities, especially as zero-shot learners, like GPT-3. However, these results rely heavily on delicate prompt design and large computation. In this work, we explore whether the strong zero-shot ability could be achieved at a smaller model scale without any external supervised data. To achieve this goal, we revisit masked language modeling and present a geometry-guided self-supervised learning method (Go-tuningfor short) by taking a small number of task-aware self-supervised data to update language models further. Experiments show that Go-tuning can enable T5-small (80M) competitive zero-shot results compared with large language models, such as T5-XL (3B). We also apply Go-tuning on multi-task settings and develop a multi-task model, mgo-T5 (250M). It can reach the average performance of OPT (175B) on 9 datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源