论文标题
Hopretriever:在Wikipedia上检索啤酒花以回答复杂问题
HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions
论文作者
论文摘要
从大型文本中收集支持证据(例如Wikipedia)对于开放域问答(QA)面临着巨大的挑战。尤其是,对于多跳开的域质量检查质量检查,需要聚集零星的证据件来支持答案提取。在本文中,我们提出了一个新的检索目标Hop,以收集Wikipedia的隐藏的推理证据,以回答复杂的问题。具体而言,本文中的HOP定义为超链接和相应的出站链接文档的组合。超链接被编码为提及的嵌入方式,该嵌入方式模拟了文本上下文中如何提及出站链接实体的结构化知识,并且将相应的出站链接文档编码为代表其中的非结构化知识的文档嵌入文档。因此,我们建立了Hopretriever,它可以检索Wikipedia的啤酒花,以回答复杂的问题。 HOTPOTQA数据集的实验表明,Hopretriever的表现优于先前通过大幅度发表的证据检索方法。此外,我们的方法还可以对证据收集过程产生可量化的解释。
Collecting supporting evidence from large corpora of text (e.g., Wikipedia) is of great challenge for open-domain Question Answering (QA). Especially, for multi-hop open-domain QA, scattered evidence pieces are required to be gathered together to support the answer extraction. In this paper, we propose a new retrieval target, hop, to collect the hidden reasoning evidence from Wikipedia for complex question answering. Specifically, the hop in this paper is defined as the combination of a hyperlink and the corresponding outbound link document. The hyperlink is encoded as the mention embedding which models the structured knowledge of how the outbound link entity is mentioned in the textual context, and the corresponding outbound link document is encoded as the document embedding representing the unstructured knowledge within it. Accordingly, we build HopRetriever which retrieves hops over Wikipedia to answer complex questions. Experiments on the HotpotQA dataset demonstrate that HopRetriever outperforms previously published evidence retrieval methods by large margins. Moreover, our approach also yields quantifiable interpretations of the evidence collection process.