论文标题

Wnut-2020任务2:提取信息丰富的Covid-19 Tweets-Roberta合奏以及手工制作功能的持续相关性

CXP949 at WNUT-2020 Task 2: Extracting Informative COVID-19 Tweets -- RoBERTa Ensembles and The Continued Relevance of Handcrafted Features

论文作者

Perrio, Calum, Madabushi, Harish Tayyar

论文摘要

本文介绍了我们对嘈杂用户生成的文本研讨会任务2的提交。我们通过合奏实现进行了精细调整的基于预训练的变压器的语言模型的性能,以提高使用语料库级别信息和手工制作的功能,以进行文本分类。我们测试包括上述特征在适应以外训练数据验证以外的特定主题的嘈杂数据集的挑战中,包括上述功能的有效性。我们表明,包含其他功能可以改善分类结果,并在表现最佳团队的2分内获得得分。

This paper presents our submission to Task 2 of the Workshop on Noisy User-generated Text. We explore improving the performance of a pre-trained transformer-based language model fine-tuned for text classification through an ensemble implementation that makes use of corpus level information and a handcrafted feature. We test the effectiveness of including the aforementioned features in accommodating the challenges of a noisy data set centred on a specific subject outside the remit of the pre-training data. We show that inclusion of additional features can improve classification results and achieve a score within 2 points of the top performing team.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源