样品有效的NLP模型是否更健壮？

论文标题

样品有效的NLP模型是否更健壮？

Are Sample-Efficient NLP Models More Robust?

论文作者

Liu, Nelson F., Kumar, Ananya, Liang, Percy, Jia, Robin

论文摘要

图像分类和提取性问题回答的最新结果观察到，接受较少分配数据训练的预训练的模型具有更好的分布性能。但是，目前尚不清楚这些趋势的范围如何。我们跨三个任务，三种可广泛的建模干预措施（使用不同的适应方法增加模型大小，对更多数据进行预培训）以及14个不同的数据集进行了大量实证研究，以研究样品效率（达到给定ID准确性所需的数据量）和鲁棒性（如何进行OOD评估）。我们发现，在某些建模干预措施和任务上，更高的样本效率仅与更好的平均OOD鲁棒性相关，而不是其他模型。在各个数据集上，样品效率较低的模型甚至可以更强大。这些结果表明，提高样品效率的通用方法不太可能产生通用的OOD鲁棒性改进，因为这种改进是高度数据集和任务依赖性的。即使在大型，多功能预处理模型的时代，特定于任务的决策通常也是OOD概括的必要条件。

Recent results in image classification and extractive question answering have observed that pre-trained models trained on less in-distribution data have better out-of-distribution performance. However, it is unclear how broadly these trends hold. We conduct a large empirical study across three tasks, three broadly-applicable modeling interventions (increasing model size, using a different adaptation method, and pre-training on more data), and 14 diverse datasets to investigate the relationship between sample efficiency (amount of data needed to reach a given ID accuracy) and robustness (how models fare on OOD evaluation). We find that higher sample efficiency is only correlated with better average OOD robustness on some modeling interventions and tasks, but not others. On individual datasets, models with lower sample efficiency can even be more robust. These results suggest that general-purpose methods for improving sample efficiency are unlikely to yield universal OOD robustness improvements, since such improvements are highly dataset- and task-dependent. Even in an era of large, multi-purpose pretrained models, task-specific decisions may often be necessary for OOD generalization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题