论文标题
在预训练的语言模型中解决测试时间的分配变化
Addressing Distribution Shift at Test Time in Pre-trained Language Models
论文作者
论文摘要
当应用于大多数语言处理任务时,最先进的预训练的语言模型(PLM)优于其他模型。但是,已经发现PLM会在分配变化中降低性能,这一现象是在测试时间的数据不来自与源训练集相同的分布时发生的。同样,由于长期标记的反馈循环等问题,同样的挑战是实时获得标签的任务。缺乏应对上述挑战的适当方法构成了对不断使PLM适应独特分布的方法的需求。无监督的域适应性将源模型适应了看不见的和未标记的目标域。尽管某些技术(例如数据增强)可以在几种情况下调整模型,但仅研究它们以解决分配转移问题。在这项工作中,我们提出了一种方法(MEMO-CL),该方法在分配转移下的测试时间时提高了PLM的性能。我们的方法利用了数据增强和适应性的最新无监督技术,以最大程度地减少PLM输出分布的熵。 MEMO-CL在测试集中的单个观察结果中在一批增强样品上运行。引入的技术是无监督的,域形的,不可思议的,易于实现,不需要其他数据。我们的实验导致比当前的测试时间适应基线提高3%。
State-of-the-art pre-trained language models (PLMs) outperform other models when applied to the majority of language processing tasks. However, PLMs have been found to degrade in performance under distribution shift, a phenomenon that occurs when data at test-time does not come from the same distribution as the source training set. Equally as challenging is the task of obtaining labels in real-time due to issues like long-labeling feedback loops. The lack of adequate methods that address the aforementioned challenges constitutes the need for approaches that continuously adapt the PLM to a distinct distribution. Unsupervised domain adaptation adapts a source model to an unseen as well as unlabeled target domain. While some techniques such as data augmentation can adapt models in several scenarios, they have only been sparsely studied for addressing the distribution shift problem. In this work, we present an approach (MEMO-CL) that improves the performance of PLMs at test-time under distribution shift. Our approach takes advantage of the latest unsupervised techniques in data augmentation and adaptation to minimize the entropy of the PLM's output distribution. MEMO-CL operates on a batch of augmented samples from a single observation in the test set. The technique introduced is unsupervised, domain-agnostic, easy to implement, and requires no additional data. Our experiments result in a 3% improvement over current test-time adaptation baselines.