2022 Fire的Hasoc子轨道概述：马拉地语的进攻性语言标识

论文标题

2022 Fire的Hasoc子轨道概述：马拉地语的进攻性语言标识

Overview of the HASOC Subtrack at FIRE 2022: Offensive Language Identification in Marathi

论文作者

Ranasinghe, Tharindu, North, Kai, Premasiri, Damith, Zampieri, Marcos

论文摘要

近年来，在线进攻内容的广泛存在已成为引起人们关注的原因，激励研究人员开发能够自动识别此类内容的强大系统。为了对这些系统进行公平的评估，已经组织了几场国际比赛，为社区提供了重要的基准数据和各种语言的评估方法。自2019年以来，HASOC（仇恨言论和进攻内容识别）共享任务是这些举措之一。在其第四次迭代中，Hasoc 2022包括三个英语，印地语和马拉地语的子轨道。在本文中，我们报告了HASOC 2022 MARATHI子轨道的结果，该子Track为参与者提供了包含来自Twitter的数据的数据集，该数据集使用流行的OLID分类法手动注释。 Marathi曲目还有另外三个子跟踪，每个字节对应于分类学的一个级别：任务A-进攻性内容识别（进攻性与非攻势）；任务B-进攻类型（目标与非目标）的分类以及任务C-进攻目标身份（个人与组与其他人）。总体而言，有10支球队提交了59次奔跑。最佳系统获得了子Track 3a的F1为0.9745，子轨道3B的F1为0.9207，子Track 3C的F1获得了0.9207的F1，F1的F1为0.9607。表现最好的算法是传统和深度学习方法的混合物。

The widespread of offensive content online has become a reason for great concern in recent years, motivating researchers to develop robust systems capable of identifying such content automatically. With the goal of carrying out a fair evaluation of these systems, several international competitions have been organized, providing the community with important benchmark data and evaluation methods for various languages. Organized since 2019, the HASOC (Hate Speech and Offensive Content Identification) shared task is one of these initiatives. In its fourth iteration, HASOC 2022 included three subtracks for English, Hindi, and Marathi. In this paper, we report the results of the HASOC 2022 Marathi subtrack which provided participants with a dataset containing data from Twitter manually annotated using the popular OLID taxonomy. The Marathi track featured three additional subtracks, each corresponding to one level of the taxonomy: Task A - offensive content identification (offensive vs. non-offensive); Task B - categorization of offensive types (targeted vs. untargeted), and Task C - offensive target identification (individual vs. group vs. others). Overall, 59 runs were submitted by 10 teams. The best systems obtained an F1 of 0.9745 for Subtrack 3A, an F1 of 0.9207 for Subtrack 3B, and F1 of 0.9607 for Subtrack 3C. The best performing algorithms were a mixture of traditional and deep learning approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题