通过对情境化的重要性预测进行扩展

论文标题

通过对情境化的重要性预测进行扩展

Expansion via Prediction of Importance with Contextualization

论文作者

MacAvaney, Sean, Nardini, Franco Maria, Perego, Raffaele, Tonellotto, Nicola, Goharian, Nazli, Frieder, Ophir

论文摘要

在很少的文本上下文中识别相关性是通过检索的主要挑战。我们采用基于表示形式的排名方法来解决此问题：（1）使用上下文化的语言模型明确对每个术语的重要性进行建模；（2）通过传播对类似术语的重要性来执行通过扩展；（3）以词典的形式为基础，使其可解决。可以在索引时间预先计算段落，以减少查询时间延迟。我们称我们的方法为史诗般（通过情境化的重要性预测来扩展）。我们表明，史诗般的表现明显优于先前的重要性建模和文档扩展方法。我们还观察到，与当前领先的第一阶段检索方法相关的性能是加性的，从而进一步缩小了廉价和过于成本的通道排名方法之间的差距。具体而言，EPIC在MS-Marco段落排名数据集上获得了MRR@10中的10个，商品硬件的平均查询延迟为78。我们还发现，通过修剪文档表示形式将延迟进一步降低至68ms，实际上没有效率差异。

The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon, making them interpretable. Passage representations can be pre-computed at index time to reduce query-time latency. We call our approach EPIC (Expansion via Prediction of Importance with Contextualization). We show that EPIC significantly outperforms prior importance-modeling and document expansion approaches. We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches. Specifically, EPIC achieves a MRR@10 of 0.304 on the MS-MARCO passage ranking dataset with 78ms average query latency on commodity hardware. We also find that the latency is further reduced to 68ms by pruning document representations, with virtually no difference in effectiveness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题