手术微调改善了对分配变化的适应

论文标题

手术微调改善了对分配变化的适应

Surgical Fine-Tuning Improves Adaptation to Distribution Shifts

论文作者

Lee, Yoonho, Chen, Annie S., Tajwar, Fahim, Kumar, Ananya, Yao, Huaxiu, Liang, Percy, Finn, Chelsea

论文摘要

在分配转移下转移学习的一种常见方法是对预训练模型的最后几层进行微调，从而保留了学习的功能，同时还可以适应新任务。本文表明，在这种设置中，选择性微调一部分（我们称为手术微调）的子集匹配或优于常用的微调方法。此外，分布转移的类型影响哪个子集更有效地调整：例如，对于图像损坏，只对前几层进行微调最佳。我们在涵盖三种类型的分布变化的七个现实世界数据任务中系统地验证了我们的发现。从理论上讲，我们证明，对于在理想化的设置中的两层神经网络，第一层调整可以胜过所有层的微调。直观地，对小目标数据集上的更多参数进行微调可能会导致在预训练期间学习的信息被遗忘，并且相关信息取决于偏移的类型。

A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task. This paper shows that in such settings, selectively fine-tuning a subset of layers (which we term surgical fine-tuning) matches or outperforms commonly used fine-tuning approaches. Moreover, the type of distribution shift influences which subset is more effective to tune: for example, for image corruptions, fine-tuning only the first few layers works best. We validate our findings systematically across seven real-world data tasks spanning three types of distribution shifts. Theoretically, we prove that for two-layer neural networks in an idealized setting, first-layer tuning can outperform fine-tuning all layers. Intuitively, fine-tuning more parameters on a small target dataset can cause information learned during pre-training to be forgotten, and the relevant information depends on the type of shift.

下载PDF全文

下载文献需遵守相关版权规定

论文标题