论文标题
稀疏度受限的非负张量分解用于在不同时间尺度上检测主题
Sparseness-constrained Nonnegative Tensor Factorization for Detecting Topics at Different Time Scales
论文作者
论文摘要
时间数据(例如新闻文章或Twitter提要)通常由持久趋势和流行但持久的感兴趣的主题组成。真正成功的主题建模策略应该能够检测两种类型的主题并清楚地定位它们。在本文中,我们首先表明非负CANDECOMP/PARAFAC分解(NCPD)能够自动发现可变持久性的主题。然后,我们提出了稀疏约束的NCPD(S-NCPD)及其在线变体,以便有效,有效地控制学习主题的长度。此外,我们提出了定量方法来衡量主题的长度,并证明了S-NCPD(以及其在线变体)以半合成和现实世界中的数据(包括新闻头条)以受控方式以受控方式发现短而持久的时间主题的能力。我们还证明,S-NCPD的在线变体比S-NCPD更快地减少了重建误差。
Temporal data (such as news articles or Twitter feeds) often consists of a mixture of long-lasting trends and popular but short-lasting topics of interest. A truly successful topic modeling strategy should be able to detect both types of topics and clearly locate them in time. In this paper, we first show that nonnegative CANDECOMP/PARAFAC decomposition (NCPD) is able to discover topics of variable persistence automatically. Then, we propose sparseness-constrained NCPD (S-NCPD) and its online variant in order to actively control the length of the learned topics effectively and efficiently. Further, we propose quantitative ways to measure the topic length and demonstrate the ability of S-NCPD (as well as its online variant) to discover short and long-lasting temporal topics in a controlled manner in semi-synthetic and real-world data including news headlines. We also demonstrate that the online variant of S-NCPD reduces the reconstruction error more rapidly than S-NCPD.