论文标题

FARSBASE-KBP:波斯知识图的知识库人口系统

FarsBase-KBP: A Knowledge Base Population System for the Persian Knowledge Graph

论文作者

Asgari-Bidhendi, Majid, Janfada, Behrooz, Minaei-Bidgoli, Behrouz

论文摘要

尽管大多数知识基础已经支持英语,但波斯语只有一个知识库,即Farsbase,它是通过半结构化Web信息自动创建的。与具有巨大社区支持的Wikidata等英语知识库不同,Farsbase之类的知识基础的人口必须依靠自动提取知识。随着系统的持续工作,知识库的人群可以使FARSBase的大小不断增长。在本文中,我们提出了波斯语的知识库人口系统,该系统从网络上爬行的未标记的原始文本中提取知识。所提出的系统由一组最新的模块组成,例如链接模块的实体以及为FARSBase设计的信息和关系提取模块。此外,引入了一个规范化系统,以将提取的关系与FARSBase属性联系起来。然后,该系统使用知识融合技术在人类专家的干预范围内最少,以整合和过滤每个模块提取的适当知识实例。为了评估提出的知识库人口系统的性能,我们介绍了第一个用于基准波斯语中知识库人群的金数据集,该数据集由22015 Farsbase Triples组成,并由人类专家验证。评估结果证明了所提出的系统的效率。

While most of the knowledge bases already support the English language, there is only one knowledge base for the Persian language, known as FarsBase, which is automatically created via semi-structured web information. Unlike English knowledge bases such as Wikidata, which have tremendous community support, the population of a knowledge base like FarsBase must rely on automatically extracted knowledge. Knowledge base population can let FarsBase keep growing in size, as the system continues working. In this paper, we present a knowledge base population system for the Persian language, which extracts knowledge from unlabeled raw text, crawled from the Web. The proposed system consists of a set of state-of-the-art modules such as an entity linking module as well as information and relation extraction modules designed for FarsBase. Moreover, a canonicalization system is introduced to link extracted relations to FarsBase properties. Then, the system uses knowledge fusion techniques with minimal intervention of human experts to integrate and filter the proper knowledge instances, extracted by each module. To evaluate the performance of the presented knowledge base population system, we present the first gold dataset for benchmarking knowledge base population in the Persian language, which consisting of 22015 FarsBase triples and verified by human experts. The evaluation results demonstrate the efficiency of the proposed system.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源