Herdphobia：尼日利亚针对富拉尼的仇恨言论的数据集

论文标题

Herdphobia：尼日利亚针对富拉尼的仇恨言论的数据集

HERDPhobia: A Dataset for Hate Speech against Fulani in Nigeria

论文作者

Aliyu, Saminu Mohammad, Wajiga, Gregory Maksha, Murtala, Muhammad, Muhammad, Shamsuddeen Hassan, Abdulmumin, Idris, Ahmad, Ibrahim Said

论文摘要

社交媒体平台使用户可以自由地分享他们对问题或任何感觉的意见。但是，它们也使传播仇恨和虐待内容变得更加容易。富拉尼族一直是这种不幸现象的受害者。本文介绍了Herdphobia-第一个注释的仇恨言论数据集在尼日利亚的Fulani Herders上 - 使用三种语言：英语，尼日利亚语，尼日利亚语和豪萨。我们提出了使用预训练的语言模型进行基准实验，以将推文分类为可恶或不讨厌的。我们的实验表明，XML-T模型以99.83％的加权F1提供了更好的性能。我们在https://github.com/hausanlp/herdphobia上发布了数据集，以进行进一步研究。

Social media platforms allow users to freely share their opinions about issues or anything they feel like. However, they also make it easier to spread hate and abusive content. The Fulani ethnic group has been the victim of this unfortunate phenomenon. This paper introduces the HERDPhobia - the first annotated hate speech dataset on Fulani herders in Nigeria - in three languages: English, Nigerian-Pidgin, and Hausa. We present a benchmark experiment using pre-trained languages models to classify the tweets as either hateful or non-hateful. Our experiment shows that the XML-T model provides better performance with 99.83% weighted F1. We released the dataset at https://github.com/hausanlp/HERDPhobia for further research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题