论文标题

没有留下的参数:蒸馏和型号的大小如何影响零射击检索

No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval

论文作者

Rosa, Guilherme Moraes, Bonifacio, Luiz, Jeronymo, Vitor, Abonizio, Hugo, Fadaee, Marzieh, Lotufo, Roberto, Nogueira, Rodrigo

论文摘要

最近的工作表明,小型的蒸馏语言模型是模型的强大竞争者,这些模型是在各种信息检索任务中更大且较慢的数量级。由于延迟限制,这使蒸馏型模型成为了现实检索应用程序中部署的首选选择。在这项工作中,我们通过证明参数和早期查询文档互动的数量在检索模型的概括能力中起着重要作用来质疑这种做法。我们的实验表明,增加模型大小会导致内域测试集的边际收益,但是在微调过程中从未见过的新领域的增长幅度更大。此外,我们表明,在几个任务中,Rerankers在很大程度上均超过了相似大小的密集。我们最大的读者在基准-IR(Beir)的18个数据集中的12个数据集中达到了最新技术,并超过了先前的最新水平。最后,我们确认内域的有效性不是零弹性有效性的良好指标。代码可从https://github.com/guilhermemr04/scaling-zero-shot-retrieval.git获得。

Recent work has shown that small distilled language models are strong competitors to models that are orders of magnitude larger and slower in a wide range of information retrieval tasks. This has made distilled and dense models, due to latency constraints, the go-to choice for deployment in real-world retrieval applications. In this work, we question this practice by showing that the number of parameters and early query-document interaction play a significant role in the generalization ability of retrieval models. Our experiments show that increasing model size results in marginal gains on in-domain test sets, but much larger gains in new domains never seen during fine-tuning. Furthermore, we show that rerankers largely outperform dense ones of similar size in several tasks. Our largest reranker reaches the state of the art in 12 of the 18 datasets of the Benchmark-IR (BEIR) and surpasses the previous state of the art by 3 average points. Finally, we confirm that in-domain effectiveness is not a good indicator of zero-shot effectiveness. Code is available at https://github.com/guilhermemr04/scaling-zero-shot-retrieval.git

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源