通过几次学习和填充来对齐岩浆

论文标题

通过几次学习和填充来对齐岩浆

Aligning MAGMA by Few-Shot Learning and Finetuning

论文作者

Layoun, Jean-Charles, Roger, Alexis, Rish, Irina

论文摘要

视觉建模的目的是允许模型与视觉输入联系起来。本文的目的是评估和对齐视觉语言模型（VLM）通过基于适配器的芬太尼（MAGMA）具有人类价值观，称为生成模型的多模式增强。岩浆是能够图像字幕和视觉提问的VLM。我们将在三种不同的情况下评估其对齐方式。首先，我们通过拥抱脸提供的检查站来评估岩浆的开箱即用对齐。然后，我们衡量是否少得多学习可以改善结果。最后，我们以对齐的示例来确定模型并评估其行为。

The goal of vision-language modeling is to allow models to tie language understanding with visual inputs. The aim of this paper is to evaluate and align the Visual Language Model (VLM) called Multimodal Augmentation of Generative Models through Adapter-based finetuning (MAGMA) with human values. MAGMA is a VLM that is capable of image captioning and visual question-answering. We will evaluate its alignment in three different scenarios. To begin, we assess MAGMA's out-of-the-box alignment through the checkpoint provided by Hugging Face. Then, we measure if few-shot learning manages to improve the results. Finally, we finetune the model on aligned examples and evaluate its behavior.

下载PDF全文

下载文献需遵守相关版权规定

论文标题