论文标题
亲调整:统一的及时调整视力任务
Pro-tuning: Unified Prompt Tuning for Vision Tasks
论文作者
论文摘要
在计算机视觉中,微调是利用预训练的视觉模型来执行下游任务的事实上的方法。但是,由于采用参数效率低下的全局更新,并且在很大程度上依赖于高质量的下游数据,因此在实践中部署它是非常具有挑战性的。最近,基于及时的学习添加了与任务相关的提示,以使下游任务适应预训练的模型,从而极大地提高了许多自然语言下游任务的性能。在这项工作中,我们将这种显着的转移能力从迅速的转移能力中受益到视觉模型,以替代微调。为此,我们提出了参数有效的及时调整(亲调整),以使冷冻视觉模型适应各种下游视觉任务。亲调整的关键是基于迅速的调整,即学习特定于任务的视觉提示,以冻结的预训练模型,以促进下游输入图像。通过仅培训一些其他参数,它可以在基于CNN和基于变压器的各种架构上工作。广泛的实验证据表明,在广泛的视觉任务和场景中,主张表现优于微调,包括图像分类(通用对象,类不平衡,图像腐败,对抗性鲁棒性和分布范围内的概括)以及诸如对象的预测任务,例如对象检测和语义分段。
In computer vision, fine-tuning is the de-facto approach to leverage pre-trained vision models to perform downstream tasks. However, deploying it in practice is quite challenging, due to adopting parameter inefficient global update and heavily relying on high-quality downstream data. Recently, prompt-based learning, which adds a task-relevant prompt to adapt the downstream tasks to pre-trained models, has drastically boosted the performance of many natural language downstream tasks. In this work, we extend this notable transfer ability benefited from prompt into vision models as an alternative to fine-tuning. To this end, we propose parameter-efficient Prompt tuning (Pro-tuning) to adapt frozen vision models to various downstream vision tasks. The key to Pro-tuning is prompt-based tuning, i.e., learning task-specific vision prompts for downstream input images with the pre-trained model frozen. By only training a few additional parameters, it can work on diverse CNN-based and Transformer-based architectures. Extensive experiments evidence that Pro-tuning outperforms fine-tuning in a broad range of vision tasks and scenarios, including image classification (generic objects, class imbalance, image corruption, adversarial robustness, and out-of-distribution generalization), and dense prediction tasks such as object detection and semantic segmentation.