SATLAB在SEMEVAL-2022任务4：尝试仅使用字符和单词n-grams检测光顾和屈服语言

论文标题

SATLAB在SEMEVAL-2022任务4：尝试仅使用字符和单词n-grams检测光顾和屈服语言

SATLab at SemEval-2022 Task 4: Trying to Detect Patronizing and Condescending Language with only Character and Word N-grams

论文作者

Bestgen, Yves

论文摘要

仅针对光顾和屈服语言检测（PCL）的Semeval-2022任务4提出了一个仅包含字符和单词n-grams的逻辑回归模型。它获得了平均的性能水平，远高于试图猜测的系统的性能，而无需使用任何有关任务的知识，但要比最好的团队低得多。由于所提出的模型与在需要自动识别仇恨言论和冒犯性内容的任务上表现良好的模型非常相似，因此本文确认了PCL检测的难度。

A logistic regression model only fed with character and word n-grams is proposed for the SemEval-2022 Task 4 on Patronizing and Condescending Language Detection (PCL). It obtained an average level of performance, well above the performance of a system that tries to guess without using any knowledge about the task, but much lower than the best teams. As the proposed model is very similar to the one that performed well on a task requiring to automatically identify hate speech and offensive content, this paper confirms the difficulty of PCL detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题