鸭嘴兽是什么样的？生成零拍图像分类的自定义提示

论文标题

鸭嘴兽是什么样的？生成零拍图像分类的自定义提示

What does a platypus look like? Generating customized prompts for zero-shot image classification

论文作者

Pratt, Sarah, Covert, Ian, Liu, Rosanne, Farhadi, Ali

论文摘要

开放式摄影模型是图像分类的有希望的新范式。与传统的分类模型不同，开放式摄影模型在推理期间用自然语言指定的任何任意类别中分类。这种称为“提示”的自然语言通常由一组手写的模板（例如，“ {}”的照片）组成，这些模板已使用每个类别名称完成。这项工作引入了一种简单的方法来生成更高的准确性提示，而无需依赖任何对任务域的明确知识，并且手工构造的句子少得多。为了实现这一目标，我们将开放式摄影模型与大语言模型（LLMS）相结合，以通过语言模型（Cupl，发音为“夫妇”）创建自定义的提示。特别是，我们利用LLMS中包含的知识来生成许多描述性句子，这些句子包含图像类别的重要区分特征。这使该模型在做出预测时可以在图像中更重要。我们发现，这种直接和一般的方法可提高一系列零照片分类基准的准确性，包括ImageNet上超过一个百分比的增益。最后，这个简单的基线不需要额外的训练，并且仍然完全零射。可在https://github.com/sarahpratt/cupl上找到代码。

Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题