仇恨言论和滥用语言数据集中的交叉偏见

论文标题

仇恨言论和滥用语言数据集中的交叉偏见

Intersectional Bias in Hate Speech and Abusive Language Datasets

论文作者

Kim, Jae Yeon, Ortiz, Carlos, Nam, Sarah, Santiago, Sarah, Datta, Vivek

论文摘要

广泛应用算法来检测社交媒体中的仇恨言论和滥用语言。我们研究了用于训练这些算法的人类注销数据是否有偏差。我们使用了公开注释的Twitter数据集（Founta等，2018），并将种族，性别和党派身份尺寸分类为99,996条推文。结果表明，非裔美国人的推文被标记为滥用的可能性高3.7倍，与其他人相比，非裔美国人男性推文被标记为可恶的可能性高出77％。这些模式在统计学上具有重要意义，即使添加了一方识别作为控制变量，这些模式也具有稳定性。这项研究提供了有关仇恨言论和滥用语言数据集的交叉偏见的第一个系统证据。

Algorithms are widely applied to detect hate speech and abusive language in social media. We investigated whether the human-annotated data used to train these algorithms are biased. We utilized a publicly available annotated Twitter dataset (Founta et al. 2018) and classified the racial, gender, and party identification dimensions of 99,996 tweets. The results showed that African American tweets were up to 3.7 times more likely to be labeled as abusive, and African American male tweets were up to 77% more likely to be labeled as hateful compared to the others. These patterns were statistically significant and robust even when party identification was added as a control variable. This study provides the first systematic evidence on intersectional bias in datasets of hate speech and abusive language.

下载PDF全文

下载文献需遵守相关版权规定

论文标题