论文标题
自然语言界面与数据
Natural Language Interfaces to Data
论文作者
论文摘要
NLU和NLP的最新进展导致对数据的自然语言界面引起了重新兴趣,这为非技术用户提供了一种简单的机制,可以访问和查询数据。虽然早期系统从关键字搜索演变出来,并专注于简单的事实查询,但随着时间的推移,输入句子以及生成的SQL查询的复杂性已经发展。最近,人们也非常关注使用对话界面进行数据分析,从而使一系列非技术用户能够快速了解数据。自然语言查询(NLQ)面临三个主要挑战:(1)识别用户话语中涉及的实体,(2)以有意义的方式在基本数据源上连接不同实体以解释用户的意图,以及(3)以SQL或SPARQL的形式生成结构化查询。 有两种解释用户NLQ的主要方法。基于规则的系统利用语义索引,本体和kg来识别查询中的实体,了解这些实体之间的预期关系,并利用语法来生成目标查询。随着深度学习(DL)基于语言模型的进步,已经有许多文本到SQL方法试图使用DL模型整体解释查询。利用这两种基于规则的技术以及DL模型的混合方法也通过结合两种方法的优势而出现。对话界面是通过利用多个转向对话之间的查询上下文以防御歧义的下一个自然步骤。在本文中,我们回顾了自然语言界面中使用的背景技术,并调查了NLQ的不同方法。我们还描述了数据分析的对话界面,并讨论了用于NLQ研究和评估的几个基准。
Recent advances in NLU and NLP have resulted in renewed interest in natural language interfaces to data, which provide an easy mechanism for non-technical users to access and query the data. While early systems evolved from keyword search and focused on simple factual queries, the complexity of both the input sentences as well as the generated SQL queries has evolved over time. More recently, there has also been a lot of focus on using conversational interfaces for data analytics, empowering a line of non-technical users with quick insights into the data. There are three main challenges in natural language querying (NLQ): (1) identifying the entities involved in the user utterance, (2) connecting the different entities in a meaningful way over the underlying data source to interpret user intents, and (3) generating a structured query in the form of SQL or SPARQL. There are two main approaches for interpreting a user's NLQ. Rule-based systems make use of semantic indices, ontologies, and KGs to identify the entities in the query, understand the intended relationships between those entities, and utilize grammars to generate the target queries. With the advances in deep learning (DL)-based language models, there have been many text-to-SQL approaches that try to interpret the query holistically using DL models. Hybrid approaches that utilize both rule-based techniques as well as DL models are also emerging by combining the strengths of both approaches. Conversational interfaces are the next natural step to one-shot NLQ by exploiting query context between multiple turns of conversation for disambiguation. In this article, we review the background technologies that are used in natural language interfaces, and survey the different approaches to NLQ. We also describe conversational interfaces for data analytics and discuss several benchmarks used for NLQ research and evaluation.