论文标题
通过迭代共识组成预训练模型的合奏
Composing Ensembles of Pre-trained Models via Iterative Consensus
论文作者
论文摘要
大型预训练模型表现出取决于训练数据的独特和互补功能。诸如GPT-3之类的语言模型能够进行文本推理,但无法理解视觉信息,而诸如DALL-E之类的视觉模型可以生成逼真的照片,但无法理解复杂的语言描述。在这项工作中,我们提出了一个统一的框架,用于组合不同预训练模型的合奏 - 结合了每个单独模型的优势,以零拍的方式解决了各种多模式问题。我们使用预训练的模型作为“生成器”或“得分手”,并通过闭环迭代共识优化组成。发电机构建了建议,而得分子迭代提供了反馈,以完善生成的结果。这种闭环通信使模型能够纠正其他模型引起的错误,从而大大提高了下游任务的性能,例如将小学数学问题的准确性提高了7.5%,而无需任何模型填充。我们证明,通过利用每个专家模型的优势,一组得分手的共识优于单个得分手的反馈。结果表明,该提出的方法可以用作多种零射击多模式任务的通用框架,例如图像生成,视频问答,数学推理和机器人操作。项目页面:https://energy-lase-model.github.io/composing-proterained-models。
Large pre-trained models exhibit distinct and complementary capabilities dependent on the data they are trained on. Language models such as GPT-3 are capable of textual reasoning but cannot understand visual information, while vision models such as DALL-E can generate photorealistic photos but fail to understand complex language descriptions. In this work, we propose a unified framework for composing ensembles of different pre-trained models -- combining the strengths of each individual model to solve various multimodal problems in a zero-shot manner. We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization. The generator constructs proposals and the scorers iteratively provide feedback to refine the generated result. Such closed-loop communication enables models to correct errors caused by other models, significantly boosting performance on downstream tasks, e.g. improving accuracy on grade school math problems by 7.5%, without requiring any model finetuning. We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer, by leveraging the strengths of each expert model. Results show that the proposed method can be used as a general purpose framework for a wide range of zero-shot multimodal tasks, such as image generation, video question answering, mathematical reasoning, and robotic manipulation. Project page: https://energy-based-model.github.io/composing-pretrained-models.