论文标题
通过几次学习和填充来对齐岩浆
Aligning MAGMA by Few-Shot Learning and Finetuning
论文作者
论文摘要
视觉建模的目的是允许模型与视觉输入联系起来。本文的目的是评估和对齐视觉语言模型(VLM)通过基于适配器的芬太尼(MAGMA)具有人类价值观,称为生成模型的多模式增强。岩浆是能够图像字幕和视觉提问的VLM。我们将在三种不同的情况下评估其对齐方式。首先,我们通过拥抱脸提供的检查站来评估岩浆的开箱即用对齐。然后,我们衡量是否少得多学习可以改善结果。最后,我们以对齐的示例来确定模型并评估其行为。
The goal of vision-language modeling is to allow models to tie language understanding with visual inputs. The aim of this paper is to evaluate and align the Visual Language Model (VLM) called Multimodal Augmentation of Generative Models through Adapter-based finetuning (MAGMA) with human values. MAGMA is a VLM that is capable of image captioning and visual question-answering. We will evaluate its alignment in three different scenarios. To begin, we assess MAGMA's out-of-the-box alignment through the checkpoint provided by Hugging Face. Then, we measure if few-shot learning manages to improve the results. Finally, we finetune the model on aligned examples and evaluate its behavior.