论文标题
Habertor:有效有效的深层Hatespeech探测器
HABERTOR: An Efficient and Effective Deep Hatespeech Detector
论文作者
论文摘要
我们介绍了用于检测大规模用户生成内容的Hatespeech的Habertor模型。受BERT模型最近成功的启发,我们提出了几种修改以增强下游Hatespeech分类任务的性能。 Habertor继承了Bert的体系结构,但在四个方面有所不同:(i)它生成了自己的词汇,并使用最大的Score HatesPeech数据集在划痕中进行了预训练; (ii)它由基于四元组的分解成分组成,导致参数少得多,更快的训练和推断,以及更少的内存使用; (iii)它使用我们提议的多源集合头和合并层用于单独的输入来源,以进一步提高其有效性; (iv)它使用我们提出的细粒度和适应性噪声幅度的正规对抗训练来增强其鲁棒性。通过具有140万注释的大规模现实世界中的Hatespeech数据集的实验,我们表明Habertor的工作方式比15种最先进的HatesPeech检测方法更好,包括微调语言模型。特别是,与伯特(Bert)相比,我们的Habertor在训练/推论阶段的速度快4至5倍,使用的记忆少于1/3,并且具有更好的性能,即使我们使用少于1%的单词数量来预先培训。我们的普遍性分析表明,Habertor可以很好地转移到其他看不见的Hatespeech数据集中,并且是Hatespeech分类的BERT的更有效替代方案。
We present our HABERTOR model for detecting hatespeech in large scale user-generated content. Inspired by the recent success of the BERT model, we propose several modifications to BERT to enhance the performance on the downstream hatespeech classification task. HABERTOR inherits BERT's architecture, but is different in four aspects: (i) it generates its own vocabularies and is pre-trained from the scratch using the largest scale hatespeech dataset; (ii) it consists of Quaternion-based factorized components, resulting in a much smaller number of parameters, faster training and inferencing, as well as less memory usage; (iii) it uses our proposed multi-source ensemble heads with a pooling layer for separate input sources, to further enhance its effectiveness; and (iv) it uses a regularized adversarial training with our proposed fine-grained and adaptive noise magnitude to enhance its robustness. Through experiments on the large-scale real-world hatespeech dataset with 1.4M annotated comments, we show that HABERTOR works better than 15 state-of-the-art hatespeech detection methods, including fine-tuning Language Models. In particular, comparing with BERT, our HABERTOR is 4~5 times faster in the training/inferencing phase, uses less than 1/3 of the memory, and has better performance, even though we pre-train it by using less than 1% of the number of words. Our generalizability analysis shows that HABERTOR transfers well to other unseen hatespeech datasets and is a more efficient and effective alternative to BERT for the hatespeech classification.