RocketQA：一种优化的培训方法，用于通用通道检索，以回答开放域问题

论文标题

RocketQA：一种优化的培训方法，用于通用通道检索，以回答开放域问题

RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering

论文作者

Qu, Yingqi, Ding, Yuchen, Liu, Jing, Liu, Kai, Ren, Ruiyang, Zhao, Wayne Xin, Dong, Daxiang, Wu, Hua, Wang, Haifeng

论文摘要

在开放域的问题回答中，密集的通道检索已成为一个新的范式，可以检索找到答案的相关段落。通常，采用双重编码架构来学习语义匹配的问题和段落的密集表示。但是，由于挑战，包括训练和推理之间的差异，未标记的积极因素和有限的培训数据，因此很难有效地训练双重编码器。为了应对这些挑战，我们提出了一种称为RocketQa的优化培训方法，以改善密集的通道检索。我们在Rocketqa中做出了三项主要的技术贡献，即交叉零件负面影响，降低了艰苦的否定和数据的增强。实验结果表明，RocketQA在MSMARCO和自然问题上都显着胜过先前的最新模型。我们还进行了广泛的实验，以检查RocketQA三种策略的有效性。此外，我们证明了端到端质量检查的性能可以根据我们的Rocketqa猎犬提高。

In open-domain question answering, dense passage retrieval has become a new paradigm to retrieve relevant passages for finding answers. Typically, the dual-encoder architecture is adopted to learn dense representations of questions and passages for semantic matching. However, it is difficult to effectively train a dual-encoder due to the challenges including the discrepancy between training and inference, the existence of unlabeled positives and limited training data. To address these challenges, we propose an optimized training approach, called RocketQA, to improving dense passage retrieval. We make three major technical contributions in RocketQA, namely cross-batch negatives, denoised hard negatives and data augmentation. The experiment results show that RocketQA significantly outperforms previous state-of-the-art models on both MSMARCO and Natural Questions. We also conduct extensive experiments to examine the effectiveness of the three strategies in RocketQA. Besides, we demonstrate that the performance of end-to-end QA can be improved based on our RocketQA retriever.

下载PDF全文

下载文献需遵守相关版权规定

论文标题