基于动量的重量插值，用于持续学习的强零模型

论文标题

基于动量的重量插值，用于持续学习的强零模型

Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

论文作者

Stojanovski, Zafir, Roth, Karsten, Akata, Zeynep

论文摘要

大型的预训练，零弹的模型在标准传输和适应任务方面表现出了相当大的成功，对分配转移特别稳定。此外，随后的微调可以大大提高所选下游任务的性能。但是，通过幼稚的微调，这些零击模型失去了对分布变化的概括性和鲁棒性。对于诸如持续学习（CL）之类的任务，这是一个特殊的问题，在依次介绍新任务分布时，必须执行连续适应性。在这项工作中，我们展示了微调掉落以适应此类零照片的模型，基于动量的重量插值可以为无内存和基于内存的设置中的CL任务提供一致的改进。特别是，我们发现标准CL基准的$+4 \％$的改善，同时将误差降低到对所有任务的共同培训的上限，将所有任务的零件立即减少一半以上，从而使持续的学习者更接近联合培训限制。

Large pre-trained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks, with particular robustness towards distribution shifts. In addition, subsequent fine-tuning can considerably improve performance on a selected downstream task. However, through naive fine-tuning, these zero-shot models lose their generalizability and robustness towards distribution shifts. This is a particular problem for tasks such as Continual Learning (CL), where continuous adaptation has to be performed as new task distributions are introduced sequentially. In this work, we showcase that where fine-tuning falls short to adapt such zero-shot capable models, simple momentum-based weight interpolation can provide consistent improvements for CL tasks in both memory-free and memory-based settings. In particular, we find improvements of over $+4\%$ on standard CL benchmarks, while reducing the error to the upper limit of jointly training on all tasks at once in parts by more than half, allowing the continual learner to inch closer to the joint training limits.

下载PDF全文

下载文献需遵守相关版权规定

论文标题