论文标题

全球有争议的政治数据库(Glocon)注释手册

Global Contentious Politics Database (GLOCON) Annotation Manuals

论文作者

Duruşan, Fırat, Hürriyetoğlu, Ali, Yörük, Erdem, Mutlu, Osman, Yoltar, Çağrı, Gürel, Burak, Comin, Alvaro

论文摘要

数据库创建使用了自动文本处理工具,该工具检测到新闻文章是否包含抗议活动,在文章中找到抗议信息,并提取有关检测到的抗议活动的信息。培训和测试自动化工具的基础是Glocon Gold Standard语料库(GSC),该工具包含来自每个重点国家的多个来源的新闻文章。 GSC中的文章是由熟练的注释者在分类和提取任务中手动编码的,并具有自动化工具开发需求的最高准确性和一致性。为了确保这些内容,本文档中的注释手册列出了根据新闻文章编码的规则。注释者始终参考所有注释任务的手册,并应用其包含的规则。注释手册的内容建立在其他突出的注释手册(例如Ace,Cameo和Timeml)中列出的语言注释的一般原则和标准。但是,这些原则已得到了改编或进行了重大修改,以适应EMW项目中使用的社会科学概念和变量。该手册在伴随GSC注释的长期试验和错误过程中被模制。它的当前形状大部分归功于高度专业的注释者团队提供的一丝不少的作品和宝贵的反馈,他们的勤奋和专业知识大大提高了语料库的质量。

The database creation utilized automated text processing tools that detect if a news article contains a protest event, locate protest information within the article, and extract pieces of information regarding the detected protest events. The basis of training and testing the automated tools is the GLOCON Gold Standard Corpus (GSC), which contains news articles from multiple sources from each focus country. The articles in the GSC were manually coded by skilled annotators in both classification and extraction tasks with the utmost accuracy and consistency that automated tool development demands. In order to assure these, the annotation manuals in this document lay out the rules according to which annotators code the news articles. Annotators refer to the manuals at all times for all annotation tasks and apply the rules that they contain. The content of the annotation manual is built on the general principles and standards of linguistic annotation laid out in other prominent annotation manuals such as ACE, CAMEO, and TimeML. These principles, however, have been adapted or rather modified heavily to accommodate the social scientific concepts and variables employed in the EMW project. The manual has been molded throughout a long trial and error process that accompanied the annotation of the GSC. It owes much of its current shape to the meticulous work and invaluable feedback provided by highly specialized teams of annotators, whose diligence and expertise greatly increased the quality of the corpus.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源