用于推断主题和可视化的自动编码变异贝叶斯

论文标题

用于推断主题和可视化的自动编码变异贝叶斯

Auto-Encoding Variational Bayes for Inferring Topics and Visualization

论文作者

Pham, Dang, Le, Tuan M. V.

论文摘要

可视化和主题建模是广泛使用的文本分析方法。传统的可视化方法在可视化空间（通常是2D或3D）中发现文档的低维表示，可以使用散点图显示。相比之下，主题建模旨在从文本中发现主题，但是为了可视化，需要使用降低降低方法进行事后嵌入。最近的方法建议使用生成模型共同找到主题和可视化，从而使语义可以在可视化空间中注入，以进行有意义的解释。阻止这些方法实际使用的主要挑战是其推论算法的可扩展性。据我们所知，我们介绍了基于共同推断主题和可视化的第一个快速自动编码的推理方法。由于我们的方法是黑匣子，因此可以通过几乎没有数学重新启动工作来有效地处理模型更改。我们证明了我们方法对现实世界中大数据集的效率和有效性，并将其与现有基线进行比较。

Visualization and topic modeling are widely used approaches for text analysis. Traditional visualization methods find low-dimensional representations of documents in the visualization space (typically 2D or 3D) that can be displayed using a scatterplot. In contrast, topic modeling aims to discover topics from text, but for visualization, one needs to perform a post-hoc embedding using dimensionality reduction methods. Recent approaches propose using a generative model to jointly find topics and visualization, allowing the semantics to be infused in the visualization space for a meaningful interpretation. A major challenge that prevents these methods from being used practically is the scalability of their inference algorithms. We present, to the best of our knowledge, the first fast Auto-Encoding Variational Bayes based inference method for jointly inferring topics and visualization. Since our method is black box, it can handle model changes efficiently with little mathematical rederivation effort. We demonstrate the efficiency and effectiveness of our method on real-world large datasets and compare it with existing baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题