Zemi：从多个任务中学习零击半参数语言模型

论文标题

Zemi：从多个任务中学习零击半参数语言模型

Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks

论文作者

Wang, Zhenhailong, Pan, Xiaoman, Yu, Dian, Yu, Dong, Chen, Jianshu, Ji, Heng

论文摘要

尽管大型语言模型已经达到了令人印象深刻的零击能力，但巨大的模型大小通常会造成高成本。最近，使用外部猎犬增强较小语言模型的半参数模型已显示出有希望的语言建模功能。但是，目前尚不清楚这种半参数模型是否可以在零弹性概括到下游任务的零弹性概括方面表现出色。在这项工作中，我们引入了$ \ text {zemi} $，一种零击的半参数语言模型。据我们所知，这是第一个半参数语言模型，它可以在各种持有的看不见的任务上展示出强大的零击性能。我们使用一种新颖的半参数多任务来培训$ \ text {zemi} $，引起了训练范式，与T0提出的参数多任务培训相比，该范围具有显着改善。具体而言，我们通过从大规模的任务无标记的语料库中检索进行多任务培训和零射门评估。为了结合多种潜在的嘈杂的累积增强，我们进一步提出了一种新颖的$ \ text {augmentation fusion} $模块利用感知器重新采样器和封闭式跨注意。值得注意的是，我们提出的$ \ text {zemi} _ \ text {ligal} $在所有七个评估任务上均优于T0-3B，而型号尺寸较小3.9倍。

Although large language models have achieved impressive zero-shot ability, the huge model size generally incurs high cost. Recently, semi-parametric language models, which augment a smaller language model with an external retriever, have demonstrated promising language modeling capabilities. However, it remains unclear whether such semi-parametric language models can perform competitively well as their fully-parametric counterparts on zero-shot generalization to downstream tasks. In this work, we introduce $\text{Zemi}$, a zero-shot semi-parametric language model. To our best knowledge, this is the first semi-parametric language model that can demonstrate strong zero-shot performance on a wide range of held-out unseen tasks. We train $\text{Zemi}$ with a novel semi-parametric multitask prompted training paradigm, which shows significant improvement compared with the parametric multitask training as proposed by T0. Specifically, we augment the multitask training and zero-shot evaluation with retrieval from a large-scale task-agnostic unlabeled corpus. In order to incorporate multiple potentially noisy retrieved augmentations, we further propose a novel $\text{augmentation fusion}$ module leveraging perceiver resampler and gated cross-attention. Notably, our proposed $\text{Zemi}_\text{LARGE}$ outperforms T0-3B by 16% on all seven evaluation tasks while being 3.9x smaller in model size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题