论文标题
适应或注释:开放域问题的域适应域的挑战和干预措施回答
To Adapt or to Annotate: Challenges and Interventions for Domain Adaptation in Open-Domain Question Answering
论文作者
论文摘要
开放域问答(ODQA)的最新进展表现出了标准Wikipedia风格基准测试的令人印象深刻的准确性。但是,尚不清楚这些模型的鲁棒性以及它们在截然不同的域中应用于现实世界应用时的性能。尽管已经进行了一些工作,研究了ODQA模型在测试不域(OOD)概括时的性能,但这些研究仅在数据分布的保守性转移下进行,通常专注于单个组件(即检索),而不是端到端系统。作为回应,我们提出了一个更现实,更具挑战性的域转移评估设置,并通过广泛的实验进行研究端到端模型性能。我们发现,模型不仅无法概括,而且高检索分数通常仍然产生差的答案预测准确性。然后,我们对不同类型的班次进行分类,并提出技术,这些技术在使用新数据集呈现时预测干预方法是否可能成功。最后,使用此分析的见解,我们提出并评估了几种干预方法,这些方法将端到端的答案得分提高了24分。
Recent advances in open-domain question answering (ODQA) have demonstrated impressive accuracy on standard Wikipedia style benchmarks. However, it is less clear how robust these models are and how well they perform when applied to real-world applications in drastically different domains. While there has been some work investigating how well ODQA models perform when tested for out-of-domain (OOD) generalization, these studies have been conducted only under conservative shifts in data distribution and typically focus on a single component (ie. retrieval) rather than an end-to-end system. In response, we propose a more realistic and challenging domain shift evaluation setting and, through extensive experiments, study end-to-end model performance. We find that not only do models fail to generalize, but high retrieval scores often still yield poor answer prediction accuracy. We then categorize different types of shifts and propose techniques that, when presented with a new dataset, predict if intervention methods are likely to be successful. Finally, using insights from this analysis, we propose and evaluate several intervention methods which improve end-to-end answer F1 score by up to 24 points.