进行调整：提高较小语言模型的零拍学习能力

论文标题

进行调整：提高较小语言模型的零拍学习能力

Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language Models

论文作者

Xu, Jingjing, Dong, Qingxiu, Liu, Hongyi, Li, Lei

论文摘要

随着规模的增加，大型语言模型既展示了定量的改进和新的定性功能，尤其是零摄像的学习者，例如GPT-3。但是，这些结果在很大程度上取决于精致的及时设计和大型计算。在这项工作中，我们探讨了在没有任何外部监督数据的情况下，是否可以在较小的模型刻度上实现强大的零击能力。为了实现这一目标，我们通过获取少量任务感知的自我监督数据来进一步更新语言模型，从而重新审视了掩盖语言建模，并提出了几何学引导的自我监督学习方法（简称为简单）。实验表明，与大语言模型（例如T5-XL（3B））相比，Go-Tuning可以实现T5-S-MALL（80M）竞争零击结果。我们还对多任务设置进行了调整，并开发了多任务型号MGO-T5（250m）。它可以在9个数据集上达到OPT（175B）的平均性能。

With increasing scale, large language models demonstrate both quantitative improvement and new qualitative capabilities, especially as zero-shot learners, like GPT-3. However, these results rely heavily on delicate prompt design and large computation. In this work, we explore whether the strong zero-shot ability could be achieved at a smaller model scale without any external supervised data. To achieve this goal, we revisit masked language modeling and present a geometry-guided self-supervised learning method (Go-tuningfor short) by taking a small number of task-aware self-supervised data to update language models further. Experiments show that Go-tuning can enable T5-small (80M) competitive zero-shot results compared with large language models, such as T5-XL (3B). We also apply Go-tuning on multi-task settings and develop a multi-task model, mgo-T5 (250M). It can reach the average performance of OPT (175B) on 9 datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题