论文标题
有效的常见问题检索和问题与无监督的知识注入
Effective FAQ Retrieval and Question Matching With Unsupervised Knowledge Injection
论文作者
论文摘要
常见问题(FAQ)检索,目的是提供有关频繁的问题或疑虑的信息,在许多领域都有深远的应用程序,在这些领域中,提问者(Q-A)的集合可以采用先验的汇编来检索适当的答案,以响应用户\ u2019的质疑,这可能经常重复使用。为此,通过考虑查询与问题(Q-Q)之间的相似性,问题与问题的相关性(Q-A)之间的相关性(Q-A)之间的相关性,或结合从Q-Q相似性度量和Q-A相关性量度组合的线索,通常是通过考虑查询与问题(Q-Q)之间的相似性(Q-Q)之间的相似性(Q-Q)之间的相似性(Q-Q)之间的相似性(Q-Q)之间的相似性来对问题进行排名。在本文中,我们通过将从Q-Q相似性度量和Q-A相关性措施收集的线索结合在一起,并将其注入额外的单词相互作用信息(从通用(开放域)知识库中提取,将其注入上下文语言模型,以推断Q-A相关性。此外,我们还探索以不受监督的方式利用特定于域的局部局部关系关系,充当监督特定领域特定的知识基础信息的代孕。因此,它使该模型能够为句子表示形式配备有关单词之间特定域特异性和与局部相关关系的知识,从而提供了更好的Q-A相关度量。我们在公开可用的中国常见问题数据集上评估了方法的变体,并将其进一步应用于上下文,并将其应用于大规模的问题匹配任务,该任务的目的是从质量检查数据集中搜索与输入查询相似的QA数据集中的问题。这两个数据集的广泛实验结果证实了与某些最先进的方法相关的拟议方法的有希望的表现。
Frequently asked question (FAQ) retrieval, with the purpose of providing information on frequent questions or concerns, has far-reaching applications in many areas, where a collection of question-answer (Q-A) pairs compiled a priori can be employed to retrieve an appropriate answer in response to a user\u2019s query that is likely to reoccur frequently. To this end, predominant approaches to FAQ retrieval typically rank question-answer pairs by considering either the similarity between the query and a question (q-Q), the relevance between the query and the associated answer of a question (q-A), or combining the clues gathered from the q-Q similarity measure and the q-A relevance measure. In this paper, we extend this line of research by combining the clues gathered from the q-Q similarity measure and the q-A relevance measure and meanwhile injecting extra word interaction information, distilled from a generic (open domain) knowledge base, into a contextual language model for inferring the q-A relevance. Furthermore, we also explore to capitalize on domain-specific topically-relevant relations between words in an unsupervised manner, acting as a surrogate to the supervised domain-specific knowledge base information. As such, it enables the model to equip sentence representations with the knowledge about domain-specific and topically-relevant relations among words, thereby providing a better q-A relevance measure. We evaluate variants of our approach on a publicly-available Chinese FAQ dataset, and further apply and contextualize it to a large-scale question-matching task, which aims to search questions from a QA dataset that have a similar intent as an input query. Extensive experimental results on these two datasets confirm the promising performance of the proposed approach in relation to some state-of-the-art ones.