论文标题
CPL:反事实及时学习视觉和语言模型
CPL: Counterfactual Prompt Learning for Vision and Language Models
论文作者
论文摘要
及时的调整是一种新的几次转移学习技术,只能调整可学习的提示,以获取预训练的视觉和语言模型,例如剪辑。但是,现有的及时调整方法倾向于学习虚假或纠缠的表示,这导致不良的概括是看不见的概念。朝着无限的例子中进行无流浪和高效的迅速学习,本文提出了一种小说\下划线{\ textbf {c}} ounounterfactual \ underline {\ textbf {p textbf {p}} rompt \ rompt \ supperline {\ textbf {\ textbf {l} 框架。特别是,CPL通过识别导致概念变化的语义相似的正面和负面样本之间的最小非激发特征变化来构建反事实,并通过对比鲜明的学习从事实和反事实示例中学习了更概括的及时表示。广泛的实验表明,与以前的剪辑上的迅速调整方法相比,CPL可以在不同的视觉和语言任务上获得优越的射击性能。在图像分类上,我们在七个数据集中实现了3.55%的平均相对改进;在图像文本检索和视觉问题回答上,我们分别在看不见的测试集中的三个少数场景中最多可获得4.09 \%和25.08 \%的相对改进。
Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP. However, existing prompt tuning methods tend to learn spurious or entangled representations, which leads to poor generalization to unseen concepts. Towards non-spurious and efficient prompt learning from limited examples, this paper presents a novel \underline{\textbf{C}}ounterfactual \underline{\textbf{P}}rompt \underline{\textbf{L}}earning (CPL) method for vision and language models, which simultaneously employs counterfactual generation and contrastive learning in a joint optimization framework. Particularly, CPL constructs counterfactual by identifying minimal non-spurious feature change between semantically-similar positive and negative samples that causes concept change, and learns more generalizable prompt representation from both factual and counterfactual examples via contrastive learning. Extensive experiments demonstrate that CPL can obtain superior few-shot performance on different vision and language tasks than previous prompt tuning methods on CLIP. On image classification, we achieve 3.55\% average relative improvement on unseen classes across seven datasets; on image-text retrieval and visual question answering, we gain up to 4.09\% and 25.08\% relative improvements across three few-shot scenarios on unseen test sets respectively.