论文标题

主题:用于多模式流社交媒体主题检测的基于变压器转移学习的记忆图方法

TopicBERT: A Transformer transfer learning based memory-graph approach for multimodal streaming social media topic detection

论文作者

Asgari-Chenaghlu, Meysam, Feizi-Derakhshi, Mohammad-Reza, farzinvash, Leili, Balafar, Mohammad-Ali, Motamed, Cina

论文摘要

社交网络的实时性质具有爆发的简短信息及其各自的大量数据量表在各种各样的主题之间传播的是许多研究人员的研究兴趣。社交网络的这些属性被称为大数据的5'VS,导致了许多独特而启蒙的算法和技术应用于大型社交网络数据集和数据流。这些研究中的许多是基于对热门话题和热门社交媒体活动的检测和跟踪,这些事件有助于揭示许多未解决的问题。这些算法和在某些情况下,软件产品主要依赖于语言本身的性质。虽然,其他技术(例如无监督的数据挖掘方法)独立于语言,但没有满足许多对全面解决方案的要求。许多研究问题,例如不利的语法和新的在线用户发明的单词的嘈杂句子都在挑战维护良好的社交网络主题检测和跟踪方法;这些研究中的许多研究也忽略了单词与大多数情况下的语义关系。在这项研究中,我们使用的变压器与增量社区检测算法相结合。一方面,变压器在不同上下文中提供了单词之间的语义关系。另一方面,提出的图形挖掘技术借助简单的结构规则增强了产生的主题。从多模式数据,图像和文本中命名的实体识别,标记具有实体类型的命名实体,并使用它们调整了提取的主题。拟议系统的所有操作均已在NOSQL技术下采用大型社会数据观点。为了提出一个工作和系统的解决方案,我们将MongoDB与Neo4J结合在一起,作为我们工作的两个主要数据库系统。与三个不同数据集中的其他方法相比,提出的系统显示出更高的精度和回忆。

Real time nature of social networks with bursty short messages and their respective large data scale spread among vast variety of topics are research interest of many researchers. These properties of social networks which are known as 5'Vs of big data has led to many unique and enlightenment algorithms and techniques applied to large social networking datasets and data streams. Many of these researches are based on detection and tracking of hot topics and trending social media events that help revealing many unanswered questions. These algorithms and in some cases software products mostly rely on the nature of the language itself. Although, other techniques such as unsupervised data mining methods are language independent but many requirements for a comprehensive solution are not met. Many research issues such as noisy sentences that adverse grammar and new online user invented words are challenging maintenance of a good social network topic detection and tracking methodology; The semantic relationship between words and in most cases, synonyms are also ignored by many of these researches. In this research, we use Transformers combined with an incremental community detection algorithm. Transformer in one hand, provides the semantic relation between words in different contexts. On the other hand, the proposed graph mining technique enhances the resulting topics with aid of simple structural rules. Named entity recognition from multimodal data, image and text, labels the named entities with entity type and the extracted topics are tuned using them. All operations of proposed system has been applied with big social data perspective under NoSQL technologies. In order to present a working and systematic solution, we combined MongoDB with Neo4j as two major database systems of our work. The proposed system shows higher precision and recall compared to other methods in three different datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源