论文标题

优化量检索的测试时间查询表示

Optimizing Test-Time Query Representations for Dense Retrieval

论文作者

Sung, Mujeen, Park, Jungsoo, Kang, Jaewoo, Chen, Danqi, Lee, Jinhyuk

论文摘要

密集检索的最新发展取决于预先训练的查询和上下文编码器的查询和上下文的质量表示。在本文中,我们介绍了Tour(查询表示的测试时间优化),该文章进一步优化了由测试时间检索结果信号引导的实例级查询表示。我们利用跨编码器重新列表在检索结果上提供细粒的伪标签,并迭代地优化了具有梯度下降的查询表示。我们的理论分析表明,游览可以被视为伪相关性反馈的经典罗科算法的概括,我们提出了两个将伪标签作为硬二进制或软连续标签的变体。我们首先通过建议的短语重新列表在短语检索上进行巡回演出,并通过现成的重建器评估其在通道检索中的有效性。巡回赛极大地提高了端到端的开放域问题回答准确性以及通过检索性能。 Tour还始终将直接重新排列提高高达2.0%,同时运行1.3-2.4倍,并有效地实施。

Recent developments of dense retrieval rely on quality representations of queries and contexts from pre-trained query and context encoders. In this paper, we introduce TOUR (Test-Time Optimization of Query Representations), which further optimizes instance-level query representations guided by signals from test-time retrieval results. We leverage a cross-encoder re-ranker to provide fine-grained pseudo labels over retrieval results and iteratively optimize query representations with gradient descent. Our theoretical analysis reveals that TOUR can be viewed as a generalization of the classical Rocchio algorithm for pseudo relevance feedback, and we present two variants that leverage pseudo-labels as hard binary or soft continuous labels. We first apply TOUR on phrase retrieval with our proposed phrase re-ranker, and also evaluate its effectiveness on passage retrieval with an off-the-shelf re-ranker. TOUR greatly improves end-to-end open-domain question answering accuracy, as well as passage retrieval performance. TOUR also consistently improves direct re-ranking by up to 2.0% while running 1.3-2.4x faster with an efficient implementation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源