论文标题
查询量的预培训,以获取密集通道检索
Query-as-context Pre-training for Dense Passage Retrieval
论文作者
论文摘要
最近,已经开发了通过使用上下文监督的预训练来提高密集通道检索的性能的方法。这些方法只是考虑了同一文档中的两个段落是相关的,而没有考虑到弱相关对的可能性。因此,本文提出了查询 - 及文化预培训,这是一种简单而有效的预培训技术,可减轻该问题。查询 - 按下来的预训练假设从段落中得出的查询更可能与该段落相关并形成段落 - 问题对。然后,这些段落对的对比或生成性上下文保护的预训练。预先训练的模型在大规模通过的基准测试和零零射门基准上进行评估。实验结果表明,查询及其培训的培训带来了可观的增长,同时又加快了训练的速度,证明了训练的有效性和效率。我们的代码将在https://github.com/caskcsg/ir/tree/main/cotmae-qc上找到。
Recently, methods have been developed to improve the performance of dense passage retrieval by using context-supervised pre-training. These methods simply consider two passages from the same document to be relevant, without taking into account the possibility of weakly correlated pairs. Thus, this paper proposes query-as-context pre-training, a simple yet effective pre-training technique to alleviate the issue. Query-as-context pre-training assumes that the query derived from a passage is more likely to be relevant to that passage and forms a passage-query pair. These passage-query pairs are then used in contrastive or generative context-supervised pre-training. The pre-trained models are evaluated on large-scale passage retrieval benchmarks and out-of-domain zero-shot benchmarks. Experimental results show that query-as-context pre-training brings considerable gains and meanwhile speeds up training, demonstrating its effectiveness and efficiency. Our code will be available at https://github.com/caskcsg/ir/tree/main/cotmae-qc .