变异自动编码器的Elbo收敛到三个熵的总和

论文标题

变异自动编码器的Elbo收敛到三个熵的总和

The ELBO of Variational Autoencoders Converges to a Sum of Three Entropies

论文作者

Damm, Simon, Forster, Dennis, Velychko, Dmytro, Dai, Zhenwen, Fischer, Asja, Lücke, Jörg

论文摘要

变量自动编码器（VAE）的中心目标函数是其变异下限（ELBO）。在这里，我们表明，对于标准（即高斯），Elbo收敛到由三个熵的总和给出的值：先前分布的（负）熵，可观察到的分布的预期（负）熵以及变异分布的平均熵（后者已经是Elbo的一部分）。我们得出的分析结果是精确的，适用于小型和复杂的编码和解码器深层网络。此外，它们在任何固定点（包括局部最大值和鞍点）申请有限且无限的数据点。结果意味着，ELBO可以用于标准VAE的ELBO通常以固定点的封闭形式计算，而原始ELBO则需要积分的数值近似值。作为主要贡献，我们提供了证明VAE的Elbo的固定点等于熵总和。然后，数值实验表明，在实践中达到的固定点的那些固定点的效果中，所获得的分析结果也足够精确。此外，我们讨论了如何使用Elbo的新型熵形式来分析和理解学习行为。更普遍地，我们认为，我们的贡献对于VAE学习的未来理论和实用研究很有用，因为它们提供了有关优化VAE的参数空间中这些点的新信息。

The central objective function of a variational autoencoder (VAE) is its variational lower bound (the ELBO). Here we show that for standard (i.e., Gaussian) VAEs the ELBO converges to a value given by the sum of three entropies: the (negative) entropy of the prior distribution, the expected (negative) entropy of the observable distribution, and the average entropy of the variational distributions (the latter is already part of the ELBO). Our derived analytical results are exact and apply for small as well as for intricate deep networks for encoder and decoder. Furthermore, they apply for finitely and infinitely many data points and at any stationary point (including local maxima and saddle points). The result implies that the ELBO can for standard VAEs often be computed in closed-form at stationary points while the original ELBO requires numerical approximations of integrals. As a main contribution, we provide the proof that the ELBO for VAEs is at stationary points equal to entropy sums. Numerical experiments then show that the obtained analytical results are sufficiently precise also in those vicinities of stationary points that are reached in practice. Furthermore, we discuss how the novel entropy form of the ELBO can be used to analyze and understand learning behavior. More generally, we believe that our contributions can be useful for future theoretical and practical studies on VAE learning as they provide novel information on those points in parameters space that optimization of VAEs converges to.

下载PDF全文

下载文献需遵守相关版权规定

论文标题