针对特定域特异性变压器模型的适应文本过采样，以分析社交媒体帖子的COVID-19疫苗

论文标题

针对特定域特异性变压器模型的适应文本过采样，以分析社交媒体帖子的COVID-19疫苗

Adaptation of domain-specific transformer models with text oversampling for sentiment analysis of social media posts on Covid-19 vaccines

论文作者

Bansal, Anmol, Choudhry, Arjun, Sharma, Anubhav, Susan, Seba

论文摘要

Covid-19已遍布世界各地，已经开发了几种疫苗来应对其激增。为了确定与社交媒体帖子中与疫苗相关的正确情感，我们在与COVID-19疫苗相关的推文上微调了各种最新的预训练的变压器模型。具体而言，我们使用最近引入的最先进的预训练的变压器模型Roberta，XLNet和Bert，以及在COVID-19的推文中预先训练的域特异性变压器模型CT-Bert和Bertweet。我们通过使用基于语言模型的过采样技术（LMOTE）过采样来进一步探索文本扩展的选项，以改善这些模型的准确性，特别是对于小样本数据集，在正面，负面和中性情感类别之间存在不平衡的类别分布。我们的结果总结了我们对文本过采样的适用性的发现，用于微调用于微调最先进的预训练的变压器模型的小样本数据集，以及针对分类任务的域特异性变压器模型的实用性。

Covid-19 has spread across the world and several vaccines have been developed to counter its surge. To identify the correct sentiments associated with the vaccines from social media posts, we fine-tune various state-of-the-art pre-trained transformer models on tweets associated with Covid-19 vaccines. Specifically, we use the recently introduced state-of-the-art pre-trained transformer models RoBERTa, XLNet and BERT, and the domain-specific transformer models CT-BERT and BERTweet that are pre-trained on Covid-19 tweets. We further explore the option of text augmentation by oversampling using Language Model based Oversampling Technique (LMOTE) to improve the accuracies of these models, specifically, for small sample datasets where there is an imbalanced class distribution among the positive, negative and neutral sentiment classes. Our results summarize our findings on the suitability of text oversampling for imbalanced small sample datasets that are used to fine-tune state-of-the-art pre-trained transformer models, and the utility of domain-specific transformer models for the classification task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题