论文标题
部分可观测时空混沌系统的无模型预测
Prompt Vision Transformer for Domain Generalization
论文作者
论文摘要
尽管视觉变压器(VIT)表现出令人印象深刻的表示学习能力,但我们从经验上发现,它们不能很好地将其概括为具有以前的域概括算法的看不见的域。在本文中,我们提出了一种基于迅速学习的新方法,以嵌入域中的源域的知识提示目标域预测。具体而言,在来自相应的源域中的VIT输入令牌之前先进行域提示。每个域提示都可以有效地学习特定于领域的知识,因为它仅针对一个域进行了优化。同时,我们训练一个及时的适配器,以根据学习的源域提示为每个输入图像生成适当的提示。在测试时,提示适配器生成的改编提示可以利用室外图像和源域的特征之间的相似性,以正确整合源域知识。在四个基准数据集上进行了广泛的实验。我们的方法在平均准确性方面提高了1.4%,这是使用VIT主链改善最先进算法的3.5倍。
Though vision transformers (ViTs) have exhibited impressive ability for representation learning, we empirically find that they cannot generalize well to unseen domains with previous domain generalization algorithms. In this paper, we propose a novel approach DoPrompt based on prompt learning to embed the knowledge of source domains in domain prompts for target domain prediction. Specifically, domain prompts are prepended before ViT input tokens from the corresponding source domain. Each domain prompt learns domain-specific knowledge efficiently since it is optimized only for one domain. Meanwhile, we train a prompt adapter to produce a suitable prompt for each input image based on the learned source domain prompts. At test time, the adapted prompt generated by the prompt adapter can exploit the similarity between the feature of the out-of-domain image and source domains to properly integrate the source domain knowledge. Extensive experiments are conducted on four benchmark datasets. Our approach achieves 1.4% improvements in the averaged accuracy, which is 3.5 times the improvement of the state-of-the-art algorithm with a ViT backbone.