论文标题

用修剪数据集进行讽刺检测的填充

Finetuning for Sarcasm Detection with a Pruned Dataset

论文作者

Goyal, Ishita, Bhandia, Priyank, Dulam, Sanjana

论文摘要

讽刺是一种具有讽刺意味的形式,涉及说或写一些与真正含义相反的东西,通常以幽默或嘲笑的方式。它通常用于嘲笑或嘲笑某人或某物,或者幽默或有趣。讽刺通常是通过语气,面部表情或其他形式的非语言交流来传达的,但也可以通过使用通常与讽刺或幽默相关的某些单词或短语来表示。讽刺检测很困难,因为它依赖于上下文和非语言提示。它也可以在文化上是特定的,主观的和模棱两可的。在这项工作中,我们微调了Abaskohi等人中提出的基于罗伯塔的讽刺检测模型。 [2022]在ISARCASM数据集(Oprea and Magday [2019])上,到达最先进的(Hercog等人[2022])的0.02 F1以内。通过修剪版本的Reddit语料库(SARC)增强Isarcasm(Khodak等人[2017])来实现这种性能。我们的修剪版本比用于训练最先进模型的SARC的子集小100倍。

Sarcasm is a form of irony that involves saying or writing something that is opposite or opposite to what one really means, often in a humorous or mocking way. It is often used to mock or mock someone or something, or to be humorous or amusing. Sarcasm is usually conveyed through tone of voice, facial expressions, or other forms of nonverbal communication, but it can also be indicated by the use of certain words or phrases that are typically associated with irony or humor. Sarcasm detection is difficult because it relies on context and non-verbal cues. It can also be culturally specific, subjective and ambiguous. In this work, we fine-tune the RoBERTa based sarcasm detection model presented in Abaskohi et al. [2022] to get to within 0.02 F1 of the state-of-the-art (Hercog et al. [2022]) on the iSarcasm dataset (Oprea and Magdy [2019]). This performance is achieved by augmenting iSarcasm with a pruned version of the Self Annotated Reddit Corpus (SARC) (Khodak et al. [2017]). Our pruned version is 100 times smaller than the subset of SARC used to train the state-of-the-art model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源