论文标题
Twitmo:r的Twitter数据主题建模和可视化软件包
Twitmo: A Twitter Data Topic Modeling and Visualization Package for R
论文作者
论文摘要
我们提供Twitmo,该软件包提供了广泛的方法来收集,预处理,分析和可视化地理标签的Twitter数据。 Twitmo使用户能够从Twitter收集地理标签的推文,并提供一个全面且用户友好的工具箱,以从潜在的Dirichlet分配(LDA)(LDA),相关主题模型(CTM)和结构主题模型(STM)生成主题分布。包括用于预处理文本,模型构建和预测的功能。此外,包装的创新之一是使用主题标签和余弦相似性将推文自动汇集到更长的伪用户,以提高主题相干性。该软件包还具有功能,以可视化收集的数据集和静态模型以及交互式方式,并通过LDAVIS为模型可视化提供内置支持,为该领域的研究人员提供了极大的便利。 Twitmo软件包是一种创新的工具箱,可用于分析有关时空和时间上感兴趣的各种主题,政党或感兴趣的人的公共讨论。
We present Twitmo, a package that provides a broad range of methods to collect, pre-process, analyze and visualize geo-tagged Twitter data. Twitmo enables the user to collect geo-tagged Tweets from Twitter and and provides a comprehensive and user-friendly toolbox to generate topic distributions from Latent Dirichlet Allocations (LDA), correlated topic models (CTM) and structural topic models (STM). Functions are included for pre-processing of text, model building and prediction. In addition, one of the innovations of the package is the automatic pooling of Tweets into longer pseudo-documents using hashtags and cosine similarities for better topic coherence. The package additionally comes with functionality to visualize collected data sets and fitted models in static as well as interactive ways and offers built-in support for model visualizations via LDAvis providing great convenience for researchers in this area. The Twitmo package is an innovative toolbox that can be used to analyze public discourse of various topics, political parties or persons of interest in space and time.