支持视力语言模型推断使用混杂的知识提示

论文标题

支持视力语言模型推断使用混杂的知识提示

Supporting Vision-Language Model Inference with Confounder-pruning Knowledge Prompt

论文作者

Li, Jiangmeng, Mo, Wenyi, Qiang, Wenwen, Su, Bing, Zheng, Changwen, Xiong, Hui, Wen, Ji-Rong

论文摘要

视觉语言模型是通过在公共空间中对齐图像文本对来预先训练的，以处理开放式视觉概念。为了提高预训练模型的可传递性，最近的作品采用了固定或可学习的提示，即，从描述与任务相关类别的自然语言中合成了分类权重，以减少培训和测试阶段任务之间的差距。但是，如何以及如何提示可以改善推理性能尚不清楚。在本文中，我们明确阐明了在提示中包含语义信息的重要性，而现有提示方法在不探索文本标签的语义信息的情况下生成提示。用丰富的语义手动构建提示需要域专业知识，并且非常耗时。为了解决这个问题，我们提出了一种语义感知的提示学习方法，即CPKP，该方法通过将文本标签视为提取与任务相关的语义信息的查询来检索本体论知识图。 CPKP进一步引入了双层混杂因子程序，以完善派生的语义信息。图层混杂因素逐渐识别并逐步淘汰，灵感来自格兰杰因果关系的原则。通过遵循信息理论中的最大熵原理来拆除特征层混杂因素。从经验上讲，评估证明了CPKP的有效性，例如，通过两次镜头，CPKP的表现优于手动推出方法4.64％，而可学习的prompt方法的平均值为1.09％，与基准标记方法相比，CPKP的优势在域中的优势。我们的实施可在https://github.com/mowenyii/cpkp上获得。

Vision-language models are pre-trained by aligning image-text pairs in a common space to deal with open-set visual concepts. To boost the transferability of the pre-trained models, recent works adopt fixed or learnable prompts, i.e., classification weights are synthesized from natural language describing task-relevant categories, to reduce the gap between tasks in the training and test phases. However, how and what prompts can improve inference performance remains unclear. In this paper, we explicitly clarify the importance of including semantic information in prompts, while existing prompting methods generate prompts without exploring the semantic information of textual labels. Manually constructing prompts with rich semantics requires domain expertise and is extremely time-consuming. To cope with this issue, we propose a semantic-aware prompt learning method, namely CPKP, which retrieves an ontological knowledge graph by treating the textual label as a query to extract task-relevant semantic information. CPKP further introduces a double-tier confounder-pruning procedure to refine the derived semantic information. The graph-tier confounders are gradually identified and phased out, inspired by the principle of Granger causality. The feature-tier confounders are demolished by following the maximum entropy principle in information theory. Empirically, the evaluations demonstrate the effectiveness of CPKP, e.g., with two shots, CPKP outperforms the manual-prompt method by 4.64% and the learnable-prompt method by 1.09% on average, and the superiority of CPKP in domain generalization compared to benchmark approaches. Our implementation is available at https://github.com/Mowenyii/CPKP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题