论文标题
与知识密集型多步骤问题的思想链合理进行交织
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
论文作者
论文摘要
基于促进的大型语言模型(LLMS)在生成自然语言推理步骤或思想链(COT)以进行多步骤问题(QA)方面非常强大。但是,当必要的知识无法在其参数范围内最新或不采用最新时,他们会挣扎。在使用该问题从外部知识来源检索相关文本有助于LLMS时,我们观察到这种单步检索方法不足以用于多步质量质量质量质量标准。在这里,\ textIt {要检索的内容}取决于\ textit {已经得出的},这可能取决于\ textit {以前检索的内容}。为了解决这个问题,我们提出了Ircot,这是一种多步质量保证的新方法,它与COT中的步骤(句子)结合了检索,并用COT指导检索,然后依次使用检索结果来改善COT。使用与GPT3的IRCOT可以在四个数据集上使用IRCOT可改善检索(最多21点)以及下游QA(最多15分):HotPotQA,2Wikimultihopqa,Musique和IIRC。我们观察到在分布(OOD)设置以及诸如Flan-T5-Large之类的较小型号的情况下,也有类似的实质性收益,而无需额外的培训。 IRCOT减少了模型幻觉,从而导致了更准确的COT推理。代码,数据和提示可在\ url {https://github.com/stonybrooknlp/ircot}上找到
Prompting-based large language models (LLMs) are surprisingly powerful at generating natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question answering (QA). They struggle, however, when the necessary knowledge is either unavailable to the LLM or not up-to-date within its parameters. While using the question to retrieve relevant text from an external knowledge source helps LLMs, we observe that this one-step retrieve-and-read approach is insufficient for multi-step QA. Here, \textit{what to retrieve} depends on \textit{what has already been derived}, which in turn may depend on \textit{what was previously retrieved}. To address this, we propose IRCoT, a new approach for multi-step QA that interleaves retrieval with steps (sentences) in a CoT, guiding the retrieval with CoT and in turn using retrieved results to improve CoT. Using IRCoT with GPT3 substantially improves retrieval (up to 21 points) as well as downstream QA (up to 15 points) on four datasets: HotpotQA, 2WikiMultihopQA, MuSiQue, and IIRC. We observe similar substantial gains in out-of-distribution (OOD) settings as well as with much smaller models such as Flan-T5-large without additional training. IRCoT reduces model hallucination, resulting in factually more accurate CoT reasoning. Code, data, and prompts are available at \url{https://github.com/stonybrooknlp/ircot}