与搜索引擎的对话：基于SERP的对话响应生成

论文标题

与搜索引擎的对话：基于SERP的对话响应生成

Conversations with Search Engines: SERP-based Conversational Response Generation

论文作者

Ren, Pengjie, Chen, Zhumin, Ren, Zhaochun, Kanoulas, Evangelos, Monz, Christof, de Rijke, Maarten

论文摘要

在本文中，我们通过与搜索引擎进行对话来解决复杂信息需求的问题，即用户可以以自然语言表达其查询，并直接以对话方式从简短的系统响应中收到所需的信息。最近，已经有一些尝试类似目标的尝试，例如对会话代理（CAS）和对话搜索（CS）的研究。但是，它们要么无法满足复杂的信息需求，要么仅限于概念框架和/或基于实验室的用户研究的开发。我们在本文中实现了两个目标：（1）创建合适的数据集，作为对话（SAAC）数据集的搜索，用于开发与搜索引擎对话的管道，以及（2）开发与搜索引擎对话的Ast-tate-tate Pipeline，与搜索引擎的对话，与搜索引擎（案例）的对话（使用此数据集）。 SAAC是基于多转交谈的搜索数据集构建的，我们进一步雇用了从众包平台的工人，将每个相关段落汇总到简短的对话响应中。案例通过引入支持令牌标识模块和Aprior-Aware Pointer Pointer生成器来增强最先进的功能，这使我们能够生成更准确的响应。我们进行实验，以表明该案例能够超越强大的基线。我们还对SAAC数据集进行了广泛的分析，以表明在案例之外还有进一步改进的空间。最后，我们发布了SAAC数据集和案例的代码以及用于促进该主题的未来研究的所有模型。

In this paper, we address the problem of answering complex information needs by conversing conversations with search engines, in the sense that users can express their queries in natural language, and directly receivethe information they need from a short system response in a conversational manner. Recently, there have been some attempts towards a similar goal, e.g., studies on Conversational Agents (CAs) and Conversational Search (CS). However, they either do not address complex information needs, or they are limited to the development of conceptual frameworks and/or laboratory-based user studies. We pursue two goals in this paper: (1) the creation of a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines, and (2) the development of astate-of-the-art pipeline for conversations with search engines, the Conversations with Search Engines (CaSE), using this dataset. SaaC is built based on a multi-turn conversational search dataset, where we further employ workers from a crowdsourcing platform to summarize each relevant passage into a short, conversational response. CaSE enhances the state-of-the-art by introducing a supporting token identification module and aprior-aware pointer generator, which enables us to generate more accurate responses. We carry out experiments to show that CaSE is able to outperform strong baselines. We also conduct extensive analyses on the SaaC dataset to show where there is room for further improvement beyond CaSE. Finally, we release the SaaC dataset and the code for CaSE and all models used for comparison to facilitate future research on this topic.

下载PDF全文

下载文献需遵守相关版权规定

论文标题