更少的是：何时对人与机器相关性估计不足？

论文标题

更少的是：何时对人与机器相关性估计不足？

Less is Less: When Are Snippets Insufficient for Human vs Machine Relevance Estimation?

论文作者

Kazai, Gabriella, Mitra, Bhaskar, Dong, Anlei, Craswell, Nick, Yang, Linjun

论文摘要

传统信息检索（IR）排名模型处理文档的全文。但是，基于变压器的较新模型在处理长文本时会产生高计算成本，因此通常仅使用文档中的摘要。该模型基于文档的URL，标题和摘要（UTS）的输入类似于出现在搜索引擎结果页面（SERP）上的摘要，以帮助搜索者决定单击哪些结果。这就提出了有关何时进行排名模型或人类评估者相关性估算的问题，以及人类和机器是否以类似的方式从文档的全文中受益。为了回答这些问题，我们研究了基于人类和神经模型的相关性评估，这些评估是根据Bing的搜索日志采样的12K查询数据。当仅文档摘要以及全文也暴露于评估者，研究一系列查询和文档属性时，我们会比较相关性评估的变化，例如查询类型，摘要长度。我们的发现表明，全文对人类有益，以及用于类似查询和文档类型的BERT模型，例如尾部，长查询。然而，仔细观察表明，人类和机器以截然不同的方式响应了其他输入。添加全文也可能会损害排名者的性能，例如导航查询。

Traditional information retrieval (IR) ranking models process the full text of documents. Newer models based on Transformers, however, would incur a high computational cost when processing long texts, so typically use only snippets from the document instead. The model's input based on a document's URL, title, and snippet (UTS) is akin to the summaries that appear on a search engine results page (SERP) to help searchers decide which result to click. This raises questions about when such summaries are sufficient for relevance estimation by the ranking model or the human assessor, and whether humans and machines benefit from the document's full text in similar ways. To answer these questions, we study human and neural model based relevance assessments on 12k query-documents sampled from Bing's search logs. We compare changes in the relevance assessments when only the document summaries and when the full text is also exposed to assessors, studying a range of query and document properties, e.g., query type, snippet length. Our findings show that the full text is beneficial for humans and a BERT model for similar query and document types, e.g., tail, long queries. A closer look, however, reveals that humans and machines respond to the additional input in very different ways. Adding the full text can also hurt the ranker's performance, e.g., for navigational queries.

下载PDF全文

下载文献需遵守相关版权规定

论文标题