论文标题
哦!危险语音信息的本体论
O-Dang! The Ontology of Dangerous Speech Messages
论文作者
论文摘要
在NLP社区内部,每天都会创建,注释和发布大量的语言资源,目的是研究特定的语言现象。尽管已经进行了多种组织来组织此类资源的尝试,但仍然存在缺乏系统的方法和资源之间可能的互操作性。此外,当现在仍然存储语言信息时,最常见的做法是“黄金标准”的概念,这与NLP的最新趋势形成鲜明对比,NLP旨在强调训练机器学习和深度学习方法时不同主观性和观点的重要性。在本文中,我们介绍了o-dang!:危险语音消息的本体论,这是用于收集语言注释数据的系统和互操作知识图(kg)。哦!根据语言链接的开放数据社区中共享的原则,旨在将意大利数据集收集和组织成结构化的公斤。该本体学还旨在说明观点主义方法,因为它提供了一个模型,用于编码KG中的黄金标准和单通道标签。本文结构如下。在第1节中,概述了我们作品的动机。第2节描述了o-dang!本体论提供了一个通用的语义模型,用于在kg中集成数据集。第3节中介绍了有关语料库,用户和注释的信息的本体人口阶段。最后,在第4节中,对跨语料库的进攻性分析作为资源的第一个案例研究。
Inside the NLP community there is a considerable amount of language resources created, annotated and released every day with the aim of studying specific linguistic phenomena. Despite a variety of attempts in order to organize such resources has been carried on, a lack of systematic methods and of possible interoperability between resources are still present. Furthermore, when storing linguistic information, still nowadays, the most common practice is the concept of "gold standard", which is in contrast with recent trends in NLP that aim at stressing the importance of different subjectivities and points of view when training machine learning and deep learning methods. In this paper we present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG) for the collection of linguistic annotated data. O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community. The ontology has also been designed to account for a perspectivist approach, since it provides a model for encoding both gold standard and single-annotator labels in the KG. The paper is structured as follows. In Section 1 the motivations of our work are outlined. Section 2 describes the O-Dang! Ontology, that provides a common semantic model for the integration of datasets in the KG. The Ontology Population stage with information about corpora, users, and annotations is presented in Section 3. Finally, in Section 4 an analysis of offensiveness across corpora is provided as a first case study for the resource.