从期望最大化算法到自动编码的变分贝叶斯

论文标题

从期望最大化算法到自动编码的变分贝叶斯

From the Expectation Maximisation Algorithm to Autoencoded Variational Bayes

论文作者

Pulford, Graham W.

论文摘要

尽管1970年引入了期望最大化（EM）算法，但由于其晦涩的符号，简短的证明以及与现代机器学习技术（如自动编码的变体贝叶斯）的晦涩难懂的符号，简短的证明以及缺乏具体的链接，因此机器学习从业人员仍然无法访问。这导致了AI文献中有关“潜在变量”和“差异下限”等概念的含义的差距，这些概念经常被使用但通常无法清楚地解释。这些想法的根源在于EM算法。我们首先提供了EM算法的教程介绍，以估计$ K $组成混合物密度的参数。使用$ K $ -ARY标量隐藏（或潜在的）变量，而不是更传统的二元估值$ k $ dimeDimenialional矢量，详细介绍了高斯混合物外壳。该介绍是由目标跟踪文献的混合建模所激发的。我们以与Bishop的2009年书籍相似的方式，介绍了变异性贝叶斯推断，这是一种源自变异（或证据）下限的广义EM算法，以及平均场近似（或产品密度变换）的技术。我们继续从EM到2014年Kingma＆Welling开发的各种自动编码器的演变。这样做，我们在EM算法及其变分的对应物之间建立了明确的联系，因此阐明了“潜在变量”的含义。我们提供了“重新测量技巧”的详细覆盖范围，并着重于AEVB与常规变分贝叶斯推断的不同。在整个教程中，都使用一致的符号惯例。这统一了叙述并阐明了概念。给出了一些数值示例以进一步说明算法。

Although the expectation maximisation (EM) algorithm was introduced in 1970, it remains somewhat inaccessible to machine learning practitioners due to its obscure notation, terse proofs and lack of concrete links to modern machine learning techniques like autoencoded variational Bayes. This has resulted in gaps in the AI literature concerning the meaning of such concepts like "latent variables" and "variational lower bound," which are frequently used but often not clearly explained. The roots of these ideas lie in the EM algorithm. We first give a tutorial presentation of the EM algorithm for estimating the parameters of a $K$-component mixture density. The Gaussian mixture case is presented in detail using $K$-ary scalar hidden (or latent) variables rather than the more traditional binary valued $K$-dimenional vectors. This presentation is motivated by mixture modelling from the target tracking literature. In a similar style to Bishop's 2009 book, we present variational Bayesian inference as a generalised EM algorithm stemming from the variational (or evidential) lower bound, as well as the technique of mean field approximation (or product density transform). We continue the evolution from EM to variational autoencoders, developed by Kingma & Welling in 2014. In so doing, we establish clear links between the EM algorithm and its variational counterparts, hence clarifying the meaning of "latent variables." We provide a detailed coverage of the "reparametrisation trick" and focus on how the AEVB differs from conventional variational Bayesian inference. Throughout the tutorial, consistent notational conventions are used. This unifies the narrative and clarifies the concepts. Some numerical examples are given to further illustrate the algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题