论文标题
何时使用多任务学习与中间微调进行预训练的编码器转移学习
When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning
论文作者
论文摘要
自然语言处理(NLP)中的转移学习(TL)近年来引起了人们的兴趣,因为预训练的模型表现出令人印象深刻的转移到新任务的能力。在微调过程中,出现了三种主要策略:在培训目标任务(高架)之前,使用多任务任务(MTL)进行培训(MTL)在培训目标任务(MTL)之前,已经出现了三种主要策略,以共同培训补充任务和目标任务(成对MTL),或者简单地使用MTL在所有可用的数据集合(MTL)上进行培训。在这项工作中,我们在胶水数据集套件的全面分析中比较了所有三种TL方法。我们发现,对于何时使用这些技术而不是另一种技术,当目标任务的实例少于支撑任务,反之亦然,对何时使用这些技术之一是一个简单的启发式方法。我们表明,这在粘合数据集上的92%以上的情况下是正确的,并通过改变数据集大小的实验验证了这一假设。这种启发式的简单性和有效性令人惊讶,并保证了TL社区的进一步探索。此外,我们发现在几乎每种情况下,MTL均比成对方法要差。我们希望这项研究能够帮助其他人在TL方法之间进行NLP任务。
Transfer learning (TL) in natural language processing (NLP) has seen a surge of interest in recent years, as pre-trained models have shown an impressive ability to transfer to novel tasks. Three main strategies have emerged for making use of multiple supervised datasets during fine-tuning: training on an intermediate task before training on the target task (STILTs), using multi-task learning (MTL) to train jointly on a supplementary task and the target task (pairwise MTL), or simply using MTL to train jointly on all available datasets (MTL-ALL). In this work, we compare all three TL methods in a comprehensive analysis on the GLUE dataset suite. We find that there is a simple heuristic for when to use one of these techniques over the other: pairwise MTL is better than STILTs when the target task has fewer instances than the supporting task and vice versa. We show that this holds true in more than 92% of applicable cases on the GLUE dataset and validate this hypothesis with experiments varying dataset size. The simplicity and effectiveness of this heuristic is surprising and warrants additional exploration by the TL community. Furthermore, we find that MTL-ALL is worse than the pairwise methods in almost every case. We hope this study will aid others as they choose between TL methods for NLP tasks.