论文标题

通过邻里感知的最佳运输来转移知识,以进行低资源仇恨言论检测

Transferring Knowledge via Neighborhood-Aware Optimal Transport for Low-Resource Hate Speech Detection

论文作者

Bose, Tulika, Illina, Irina, Fohr, Dominique

论文摘要

关于在线平台上仇恨内容的兴起的兴起增加了人们对自动仇恨言语检测的关注,通常以监督分类任务表达。基于最新的深度学习方法通​​常需要大量的标签资源进行培训。但是,注释仇恨言论资源是昂贵的,耗时的,并且通常对注释者有害。这迫切需要将知识从现有标记的资源转移到低资源的仇恨言论语料库,以改善系统性能。为此,已经证明基于邻里的框架是有效的。但是,它们的灵活性有限。在我们的论文中,我们提出了一种新颖的培训策略,该策略可以灵活地建模从资源丰富的语料库中检索到的邻居的相对接近,以了解转移的量。特别是,我们将邻里信息与最佳传输结合在一起,该信息允许利用数据嵌入空间的几何形状。通过使邻居的联合嵌入和标签分布保持一致,我们证明了在低资源场景中,在不同的公开仇恨言论语料库中,在低资源场景中进行了实质性改进。

The concerning rise of hateful content on online platforms has increased the attention towards automatic hate speech detection, commonly formulated as a supervised classification task. State-of-the-art deep learning-based approaches usually require a substantial amount of labeled resources for training. However, annotating hate speech resources is expensive, time-consuming, and often harmful to the annotators. This creates a pressing need to transfer knowledge from the existing labeled resources to low-resource hate speech corpora with the goal of improving system performance. For this, neighborhood-based frameworks have been shown to be effective. However, they have limited flexibility. In our paper, we propose a novel training strategy that allows flexible modeling of the relative proximity of neighbors retrieved from a resource-rich corpus to learn the amount of transfer. In particular, we incorporate neighborhood information with Optimal Transport, which permits exploiting the geometry of the data embedding space. By aligning the joint embedding and label distributions of neighbors, we demonstrate substantial improvements over strong baselines, in low-resource scenarios, on different publicly available hate speech corpora.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源