论文标题
分裂模型的效率研究
An Efficiency Study for SPLADE Models
论文作者
论文摘要
当根据多个硬件和软件测试方案评估IR模型时,通常会忽略潜伏期和效率问题。然而,效率是此类系统的重要组成部分,不应被忽略。 在本文中,我们专注于提高SPLADE模型的效率,因为它已经在TREC收集方面取得了最新的零击性能和竞争成果。可以通过正则化因子来控制SPADE效率,但仅控制此正则化的效率还不够有效。为了减少Splade和传统检索系统之间的延迟差距,我们提出了几种技术,包括查询的L1正则化,文档/查询编码器的分离,FLOPS注册的中间训练以及使用更快的查询编码器的使用。我们的基准表明,我们可以大大提高这些模型的效率,同时增加对内域数据的性能指标。据我们所知,{我们提出了第一个神经模型,在相同的计算限制下,\ textIt {获得与传统BM25}相似的延迟(小于4ms差异),而具有\ textIt {相似的性能(小于10 \%MRR@10减少),作为对现状的单阶段的单阶段的单阶段的新级别级别对内域数据}。
Latency and efficiency issues are often overlooked when evaluating IR models based on Pretrained Language Models (PLMs) in reason of multiple hardware and software testing scenarios. Nevertheless, efficiency is an important part of such systems and should not be overlooked. In this paper, we focus on improving the efficiency of the SPLADE model since it has achieved state-of-the-art zero-shot performance and competitive results on TREC collections. SPLADE efficiency can be controlled via a regularization factor, but solely controlling this regularization has been shown to not be efficient enough. In order to reduce the latency gap between SPLADE and traditional retrieval systems, we propose several techniques including L1 regularization for queries, a separation of document/query encoders, a FLOPS-regularized middle-training, and the use of faster query encoders. Our benchmark demonstrates that we can drastically improve the efficiency of these models while increasing the performance metrics on in-domain data. To our knowledge, {we propose the first neural models that, under the same computing constraints, \textit{achieve similar latency (less than 4ms difference) as traditional BM25}, while having \textit{similar performance (less than 10\% MRR@10 reduction)} as the state-of-the-art single-stage neural rankers on in-domain data}.