论文标题
对多个仇恨的多元重视:一个基于特征的仇恨语料库在线
Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online
论文作者
论文摘要
尽管在过去十年中,在线仇恨言论(HS)在线一直是研究的重要对象,但大多数与HS相关的语料库过度简化了仇恨现象,试图将用户评论标记为“仇恨”或“中性”。这忽略了HS的复杂和主观性质,这限制了对这些语料库进行培训的分类器的现实适用性。在这项研究中,我们介绍了M-PHASIS语料库,该语料库是从与移民有关的新闻文章中收集的〜9K德国和法国用户评论。它超越了“仇恨” - “中立”二分法,而是用23个功能注释,结合结合成为各种言语的描述,从批判性评论到隐性和明确的仇恨表达。注释是由每个语言的4位母语者执行的,并获得高(0.77 <= k <= 1)通道间协议。除了描述Copus创建并从内容,错误和域分析中介绍见解外,我们还通过培训多个分类基线来探索其数据特征。
Even though hate speech (HS) online has been an important object of research in the last decade, most HS-related corpora over-simplify the phenomenon of hate by attempting to label user comments as "hate" or "neutral". This ignores the complex and subjective nature of HS, which limits the real-life applicability of classifiers trained on these corpora. In this study, we present the M-Phasis corpus, a corpus of ~9k German and French user comments collected from migration-related news articles. It goes beyond the "hate"-"neutral" dichotomy and is instead annotated with 23 features, which in combination become descriptors of various types of speech, ranging from critical comments to implicit and explicit expressions of hate. The annotations are performed by 4 native speakers per language and achieve high (0.77 <= k <= 1) inter-annotator agreements. Besides describing the corpus creation and presenting insights from a content, error and domain analysis, we explore its data characteristics by training several classification baselines.