semeval-2022任务6：使用基于生成和基于突变的数据增强的讽刺检测的比较分析

论文标题

semeval-2022任务6：使用基于生成和基于突变的数据增强的讽刺检测的比较分析

UTNLP at SemEval-2022 Task 6: A Comparative Analysis of Sarcasm Detection Using Generative-based and Mutation-based Data Augmentation

论文作者

Abaskohi, Amirhossein, Rasouli, Arash, Zeraati, Tanin, Bahrak, Behnam

论文摘要

讽刺是指使用单词嘲笑，刺激或娱乐某人的术语。它通常在社交媒体上使用。讽刺的隐喻性和创造性为基于情感计算的情感分析系统带来了重大困难。本文介绍了我们团队UTNLP的方法和结果UTNLP共享的任务6在讽刺检测中共享任务6。我们将不同的模型和数据增强方法放在测试中，并报告哪种效果最好。测试始于传统的机器学习模型，并取得了基于变压器和基于注意力的模型的进展。我们基于数据突变和数据生成采用了数据增强。使用Roberta和基于突变的数据增强，我们的最佳方法在竞争评估阶段达到了0.38的F1毛囊。竞争结束后，我们修复了模型的缺陷，并达到了0.414的F1-Sarcastic。

Sarcasm is a term that refers to the use of words to mock, irritate, or amuse someone. It is commonly used on social media. The metaphorical and creative nature of sarcasm presents a significant difficulty for sentiment analysis systems based on affective computing. The methodology and results of our team, UTNLP, in the SemEval-2022 shared task 6 on sarcasm detection are presented in this paper. We put different models, and data augmentation approaches to the test and report on which one works best. The tests begin with traditional machine learning models and progress to transformer-based and attention-based models. We employed data augmentation based on data mutation and data generation. Using RoBERTa and mutation-based data augmentation, our best approach achieved an F1-sarcastic of 0.38 in the competition's evaluation phase. After the competition, we fixed our model's flaws and achieved an F1-sarcastic of 0.414.

下载PDF全文

下载文献需遵守相关版权规定

论文标题