论文标题
XD在Semeval-2020任务12:在社交媒体中使用变压器编码器在社交媒体中进行进攻语言识别的合奏方法
XD at SemEval-2020 Task 12: Ensemble Approach to Offensive Language Identification in Social Media Using Transformer Encoders
论文作者
论文摘要
本文使用最新的变压器编码器和高性能的合奏模型介绍了六个文档分类模型,以在社交媒体中执行令人反感的语言识别任务。对于各个模型,深层变压器层用于执行多头注意。对于整体模型,从这些单个模型中获取的话语表示并置于线性解码器中以做出最终决定。我们的合奏模型优于各个模型,比开发集合的单个模型提高了8.6%。在测试集中,它可以达到90.9%的宏F1,并成为此共享任务的子任务A的85名参与者中的高性能系统之一。我们的分析表明,尽管整体模型显着提高了开发集的准确性,但在测试集中的改进并不那么明显。
This paper presents six document classification models using the latest transformer encoders and a high-performing ensemble model for a task of offensive language identification in social media. For the individual models, deep transformer layers are applied to perform multi-head attentions. For the ensemble model, the utterance representations taken from those individual models are concatenated and fed into a linear decoder to make the final decisions. Our ensemble model outperforms the individual models and shows up to 8.6% improvement over the individual models on the development set. On the test set, it achieves macro-F1 of 90.9% and becomes one of the high performing systems among 85 participants in the sub-task A of this shared task. Our analysis shows that although the ensemble model significantly improves the accuracy on the development set, the improvement is not as evident on the test set.