论文标题
你会那样问吗?衡量和改善问题的自然知识图形答案
Would You Ask it that Way? Measuring and Improving Question Naturalness for Knowledge Graph Question Answering
论文作者
论文摘要
知识图问答(KGQA)通过利用结构化数据而无需用户的正式查询语言专业知识来促进信息访问。相反,用户可以通过简单地用自然语言(NL)提出问题来表达其信息需求。用于培训将提供此类服务的KGQA模型的数据集在专家和众包劳动力方面的构建价格都很高。通常,众包劳动力用于改善正式查询产生的基于模板的伪自然问题。但是,由此产生的数据集通常没有代表真正自然和流利的语言。在目前的工作中,我们研究了表征和纠正这些缺点的方法。我们通过对现有KGQA数据集进行问题并在自然性的五个不同方面进行评估来创建IQN-KGQA测试集合。然后,重写这些问题以提高其流利度。最后,比较了NL问题的原始版本和重写版本的现有KGQA模型的性能。我们发现,当提出更现实的NL问题表述时,某些KGQA系统的情况会更糟。 IQN-KGQA测试集合是一种资源,可帮助评估更现实的环境中的KGQA系统。该测试集的构建还阐明了使用真正的NL问题构建大规模KGQA数据集的挑战。
Knowledge graph question answering (KGQA) facilitates information access by leveraging structured data without requiring formal query language expertise from the user. Instead, users can express their information needs by simply asking their questions in natural language (NL). Datasets used to train KGQA models that would provide such a service are expensive to construct, both in terms of expert and crowdsourced labor. Typically, crowdsourced labor is used to improve template-based pseudo-natural questions generated from formal queries. However, the resulting datasets often fall short of representing genuinely natural and fluent language. In the present work, we investigate ways to characterize and remedy these shortcomings. We create the IQN-KGQA test collection by sampling questions from existing KGQA datasets and evaluating them with regards to five different aspects of naturalness. Then, the questions are rewritten to improve their fluency. Finally, the performance of existing KGQA models is compared on the original and rewritten versions of the NL questions. We find that some KGQA systems fare worse when presented with more realistic formulations of NL questions. The IQN-KGQA test collection is a resource to help evaluate KGQA systems in a more realistic setting. The construction of this test collection also sheds light on the challenges of constructing large-scale KGQA datasets with genuinely NL questions.