论文标题
开放研究知识图中名为实体识别的计算机科学
Computer Science Named Entity Recognition in the Open Research Knowledge Graph
论文作者
论文摘要
针对计算机科学(CS)学术文章的特定领域命名实体识别(NER)是一项信息提取任务,可以说,对于各种注释的目的而言,可能会困扰该任务,并且比一般领域中的NER较少。鉴于NER已经取得了重大进展,我们认为在未来几年中,特定于学术领域的NER将受到越来越多的关注。当前,CS NER的进展 - 这项工作的重点 - 部分受到其新近度和缺乏对科学实体/术语的标准化注释的目标的阻碍。这项工作通过定义CS NER,研究问题,解决方案,资源,语言,工具,方法和数据集的七个中心学术实体的一组标准化任务。随后,其主要贡献是:结合了现有的CS NER资源,这些资源将其注释侧重于我们考虑的以贡献为中心的学术实体的集合或子集;此外,这项工作还指出需要大数据训练神经模型,还提供了数千个以贡献为中心的实体注释,从文章标题和摘要中提供了注释,从而释放了CS NER的累积大型新型资源。最后,训练一个序列标记CS NER模型的序列,该模型以最新的神经体系结构从一般域NER任务开始。在整个工作中,都进行了一些实际的考虑因素,这些考虑对于数字库的信息技术设计师很有用。
Domain-specific named entity recognition (NER) on Computer Science (CS) scholarly articles is an information extraction task that is arguably more challenging for the various annotation aims that can beset the task and has been less studied than NER in the general domain. Given that significant progress has been made on NER, we believe that scholarly domain-specific NER will receive increasing attention in the years to come. Currently, progress on CS NER -- the focus of this work -- is hampered in part by its recency and the lack of a standardized annotation aim for scientific entities/terms. This work proposes a standardized task by defining a set of seven contribution-centric scholarly entities for CS NER viz., research problem, solution, resource, language, tool, method, and dataset. Following which, its main contributions are: combines existing CS NER resources that maintain their annotation focus on the set or subset of contribution-centric scholarly entities we consider; further, noting the need for big data to train neural NER models, this work additionally supplies thousands of contribution-centric entity annotations from article titles and abstracts, thus releasing a cumulative large novel resource for CS NER; and, finally, trains a sequence labeling CS NER model inspired after state-of-the-art neural architectures from the general domain NER task. Throughout the work, several practical considerations are made which can be useful to information technology designers of the digital libraries.