通过迭代共识组成预训练模型的合奏

论文标题

通过迭代共识组成预训练模型的合奏

Composing Ensembles of Pre-trained Models via Iterative Consensus

论文作者

Li, Shuang, Du, Yilun, Tenenbaum, Joshua B., Torralba, Antonio, Mordatch, Igor

论文摘要

大型预训练模型表现出取决于训练数据的独特和互补功能。诸如GPT-3之类的语言模型能够进行文本推理，但无法理解视觉信息，而诸如DALL-E之类的视觉模型可以生成逼真的照片，但无法理解复杂的语言描述。在这项工作中，我们提出了一个统一的框架，用于组合不同预训练模型的合奏 - 结合了每个单独模型的优势，以零拍的方式解决了各种多模式问题。我们使用预训练的模型作为“生成器”或“得分手”，并通过闭环迭代共识优化组成。发电机构建了建议，而得分子迭代提供了反馈，以完善生成的结果。这种闭环通信使模型能够纠正其他模型引起的错误，从而大大提高了下游任务的性能，例如将小学数学问题的准确性提高了7.5％，而无需任何模型填充。我们证明，通过利用每个专家模型的优势，一组得分手的共识优于单个得分手的反馈。结果表明，该提出的方法可以用作多种零射击多模式任务的通用框架，例如图像生成，视频问答，数学推理和机器人操作。项目页面：https：//energy-lase-model.github.io/composing-proterained-models。

Large pre-trained models exhibit distinct and complementary capabilities dependent on the data they are trained on. Language models such as GPT-3 are capable of textual reasoning but cannot understand visual information, while vision models such as DALL-E can generate photorealistic photos but fail to understand complex language descriptions. In this work, we propose a unified framework for composing ensembles of different pre-trained models -- combining the strengths of each individual model to solve various multimodal problems in a zero-shot manner. We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization. The generator constructs proposals and the scorers iteratively provide feedback to refine the generated result. Such closed-loop communication enables models to correct errors caused by other models, significantly boosting performance on downstream tasks, e.g. improving accuracy on grade school math problems by 7.5%, without requiring any model finetuning. We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer, by leveraging the strengths of each expert model. Results show that the proposed method can be used as a general purpose framework for a wide range of zero-shot multimodal tasks, such as image generation, video question answering, mathematical reasoning, and robotic manipulation. Project page: https://energy-based-model.github.io/composing-pretrained-models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题