论文标题
开放域问题回答的低资源密集检索:一项全面的调查
Low-Resource Dense Retrieval for Open-Domain Question Answering: A Comprehensive Survey
论文作者
论文摘要
基于强大的预训练语言模型(PLM)的密集检索方法(DR)方法取得了重大进步,并已成为现代开放式域问答系统的关键组成部分。但是,他们需要大量的手动注释才能进行竞争性,这是不可行的。为了解决这一问题,越来越多的研究作品最近着重于在低资源场景下提高DR绩效。这些作品在培训所需的资源和采用各种技术的资源方面有所不同。了解这种差异对于在特定的低资源场景下选择正确的技术至关重要。为了促进这种理解,我们提供了针对低资源DR的主流技术的彻底结构化概述。根据他们所需的资源,我们将这些技术分为三个主要类别:(1)仅需要文档; (2)需要文件和问题; (3)需要文档和提问对。对于每种技术,我们都介绍其一般形式的算法,突出显示开放的问题以及利弊。概述了有希望的方向以供将来的研究。
Dense retrieval (DR) approaches based on powerful pre-trained language models (PLMs) achieved significant advances and have become a key component for modern open-domain question-answering systems. However, they require large amounts of manual annotations to perform competitively, which is infeasible to scale. To address this, a growing body of research works have recently focused on improving DR performance under low-resource scenarios. These works differ in what resources they require for training and employ a diverse set of techniques. Understanding such differences is crucial for choosing the right technique under a specific low-resource scenario. To facilitate this understanding, we provide a thorough structured overview of mainstream techniques for low-resource DR. Based on their required resources, we divide the techniques into three main categories: (1) only documents are needed; (2) documents and questions are needed; and (3) documents and question-answer pairs are needed. For every technique, we introduce its general-form algorithm, highlight the open issues and pros and cons. Promising directions are outlined for future research.