在社交媒体中自动识别和分类

论文标题

在社交媒体中自动识别和分类

Automatic Identification and Classification of Bragging in Social Media

论文作者

Jin, Mali, Preoţiuc-Pietro, Daniel, Doğruöz, A. Seza, Aletras, Nikolaos

论文摘要

吹牛是一种言论法案，其目的是通过对自己的积极言论来构建有利的自我形象。它在日常沟通中广泛，在社交媒体中尤其流行，用户旨在直接或间接地建立其角色的积极形象。在本文中，我们介绍了基于语言学和语用学的先前研究基础的计算语言学吹牛的首次大规模研究。为了促进这一点，我们引入了一个新的公开数据集，包括吹牛及其类型的推文。我们经验评估了（a）二进制吹牛分类中注入语言信息的不同基于变压器的模型，即，如果推文包含吹牛语句；（b）多级吹牛类型预测，包括不吹牛。我们的结果表明，在二进制和多类分类任务中，我们的模型可以预测宏F1的吹牛，最高为72.42和35.95。最后，我们对吹牛预测进行了广泛的语言和错误分析，以指导对此主题的未来研究。

Bragging is a speech act employed with the goal of constructing a favorable self-image through positive statements about oneself. It is widespread in daily communication and especially popular in social media, where users aim to build a positive image of their persona directly or indirectly. In this paper, we present the first large scale study of bragging in computational linguistics, building on previous research in linguistics and pragmatics. To facilitate this, we introduce a new publicly available data set of tweets annotated for bragging and their types. We empirically evaluate different transformer-based models injected with linguistic information in (a) binary bragging classification, i.e., if tweets contain bragging statements or not; and (b) multi-class bragging type prediction including not bragging. Our results show that our models can predict bragging with macro F1 up to 72.42 and 35.95 in the binary and multi-class classification tasks respectively. Finally, we present an extensive linguistic and error analysis of bragging prediction to guide future research on this topic.

下载PDF全文

下载文献需遵守相关版权规定

论文标题