论文标题

科尔伯特:通过上下文化的晚期互动,有效有效地搜索伯特

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

论文作者

Khattab, Omar, Zaharia, Matei

论文摘要

自然语言理解(NLU)的最新进展正在推动信息检索(IR)的快节奏进步,这在很大程度上归功于用于文档排名的微调深度语言模型(LMS)。尽管非常有效,但基于这些LMS的排名模型通过先前的方法来增加计算成本,特别是因为它们必须通过大规模的神经网络喂食每个查询文档对,以计算单个相关性评分。为了解决这个问题,我们提出了科尔伯特(Colbert),这是一种新颖的排名模型,可适应深度LMS(尤其是BERT)以进行有效的检索。科尔伯特(Colbert)介绍了一个晚期的互动体系结构,该体系结构使用伯特(Bert)独立编码查询和文档,然后采用便宜而强大的互动步骤,以建模其细粒度相似性。通过延迟并保留这种细粒状相互作用,Colbert可以利用深LM的表现力,同时获得脱机预计文档表示的能力,从而大大加快查询处理。除了降低将传统模型检索到的文档重新排列的成本外,Colbert的修剪友好的互动机制还可以直接从大型文档集合中利用矢量相似性指数来端到端检索。我们使用两个最近的段落搜索数据集对Colbert进行了广泛的评估。结果表明,Colbert的有效性与现有基于BERT的模型(胜过每个非伯特基线的表现)具有竞争力,同时执行两个量子级的命令速度更快,并且每个查询要求较少的拖放板较少。

Recent progress in Natural Language Understanding (NLU) is driving fast-paced advances in Information Retrieval (IR), largely owed to fine-tuning deep language models (LMs) for document ranking. While remarkably effective, the ranking models based on these LMs increase computational cost by orders of magnitude over prior approaches, particularly as they must feed each query-document pair through a massive neural network to compute a single relevance score. To tackle this, we present ColBERT, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval. ColBERT introduces a late interaction architecture that independently encodes the query and the document using BERT and then employs a cheap yet powerful interaction step that models their fine-grained similarity. By delaying and yet retaining this fine-granular interaction, ColBERT can leverage the expressiveness of deep LMs while simultaneously gaining the ability to pre-compute document representations offline, considerably speeding up query processing. Beyond reducing the cost of re-ranking the documents retrieved by a traditional model, ColBERT's pruning-friendly interaction mechanism enables leveraging vector-similarity indexes for end-to-end retrieval directly from a large document collection. We extensively evaluate ColBERT using two recent passage search datasets. Results show that ColBERT's effectiveness is competitive with existing BERT-based models (and outperforms every non-BERT baseline), while executing two orders-of-magnitude faster and requiring four orders-of-magnitude fewer FLOPs per query.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源