论文标题

iglue:跨模式,任务和语言转移学习的基准

IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages

论文作者

Bugliarello, Emanuele, Liu, Fangyu, Pfeiffer, Jonas, Reddy, Siva, Elliott, Desmond, Ponti, Edoardo Maria, Vulić, Ivan

论文摘要

可靠的评估基准是为了可复制性和全面性而设计的,这推动了机器学习的进步。但是,由于缺乏多语言基准,视觉和语言研究主要集中在英语任务上。为了填补这一空白,我们介绍了图像接地的语言理解评估基准。 Iglue通过汇总已有的数据集并创建新的数据来汇集 - 视觉问题回答,跨模式检索,扎根的推理以及跨20种不同语言的扎根成本。我们的基准测试能够评估多种语言多模型用于转移学习的模型,不仅在零拍设置中,而且还以新定义的少数图学习设置。基于对可用最新模型的评估,我们发现翻译测试转移优于零拍传输,而对于许多任务来说,很难利用射击的学习。此外,下游性能部分用可用的无标记文本数据进行预处理来解释,并且仅通过目标源语言的类型学距离而微弱。我们希望通过向社区释放基准测试来鼓励该领域的未来研究工作。

Reliable evaluation benchmarks designed for replicability and comprehensiveness have driven progress in machine learning. Due to the lack of a multilingual benchmark, however, vision-and-language research has mostly focused on English language tasks. To fill this gap, we introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together - by both aggregating pre-existing datasets and creating new ones - visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. Our benchmark enables the evaluation of multilingual multimodal models for transfer learning, not only in a zero-shot setting, but also in newly defined few-shot learning setups. Based on the evaluation of the available state-of-the-art models, we find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks. Moreover, downstream performance is partially explained by the amount of available unlabelled textual data for pretraining, and only weakly by the typological distance of target-source languages. We hope to encourage future research efforts in this area by releasing the benchmark to the community.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源