使用任务级混合物来吸引和了解交叉任务技能

论文标题

使用任务级混合物来吸引和了解交叉任务技能

Eliciting and Understanding Cross-Task Skills with Task-Level Mixture-of-Experts

论文作者

Ye, Qinyuan, Zha, Juan, Ren, Xiang

论文摘要

最近的工作表明，变压器模型能够对不同的NLP任务进行多任务处理，并有效地适应了新任务。但是，这些多任务模型的潜力可能会受到限制，因为它们使用相同的所有任务参数集。相比之下，人类通过对哪些技能和知识相关的适当推定并仅执行必要的计算来解决更加灵活的方式。受此启发的启发，我们建议使用任务级混合物模型，该模型具有一系列变压器层（即专家）和路由器组件，并动态和灵活地从这些专家中选择。我们发现，这些模型在适应几次射击设置的未见任务时，有助于将平均性能增益（ARG）度量提高2.6％，而在零摄像的概括设置中则提高了5.6％。此外，我们表明，学识渊博的路由决策部分重新发现了NLP任务的人类分类 - 某些专家与提取任务，一些具有分类任务以及有些任务需要世界知识的任务密切相关。

Recent works suggest that transformer models are capable of multi-tasking on diverse NLP tasks and adapting to new tasks efficiently. However, the potential of these multi-task models may be limited as they use the same set of parameters for all tasks. In contrast, humans tackle tasks in a more flexible way, by making proper presumptions on what skills and knowledge are relevant and executing only the necessary computations. Inspired by this, we propose to use task-level mixture-of-expert models, which has a collection of transformer layers (i.e., experts) and a router component that chooses from these experts dynamically and flexibly. We find that these models help improve the average performance gain (ARG) metric by 2.6% when adapting to unseen tasks in the few-shot setting and by 5.6% in the zero-shot generalization setting. Further, we show that the learned routing decisions partly rediscover human categorization of NLP tasks -- certain experts are strongly associated with extractive tasks, some with classification tasks, and some with tasks requiring world knowledge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题