k绑定的正态分布：贝叶斯神经网络中高斯平均野外后角的紧凑参数化

论文标题

k绑定的正态分布：贝叶斯神经网络中高斯平均野外后角的紧凑参数化

The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks

论文作者

Swiatkowski, Jakub, Roth, Kevin, Veeling, Bastiaan S., Tran, Linh, Dillon, Joshua V., Snoek, Jasper, Mandt, Stephan, Salimans, Tim, Jenatton, Rodolphe, Nowozin, Sebastian

论文摘要

变异贝叶斯推断是一种近似贝叶斯神经网络重量后分布的流行方法。开发此类方法的最新工作探索了近似后部的更丰富的参数化，以期改善性能。相反，在这里，我们共享一个奇怪的实验发现，这表明将变异分布限制为更紧凑的参数化。对于使用高斯平均场变异推断训练的各种深贝叶斯神经网络，我们发现后标准偏差在收敛后始终显示出强烈的低级结构。这意味着，通过将这些变分参数分解为低级别的分解，我们可以使变异近似更紧凑而不降低模型的性能。此外，我们发现这种分解的参数化改善了变异下限的随机梯度估计值的信噪比，从而导致收敛更快。

Variational Bayesian Inference is a popular methodology for approximating posterior distributions over Bayesian neural network weights. Recent work developing this class of methods has explored ever richer parameterizations of the approximate posterior in the hope of improving performance. In contrast, here we share a curious experimental finding that suggests instead restricting the variational distribution to a more compact parameterization. For a variety of deep Bayesian neural networks trained using Gaussian mean-field variational inference, we find that the posterior standard deviations consistently exhibit strong low-rank structure after convergence. This means that by decomposing these variational parameters into a low-rank factorization, we can make our variational approximation more compact without decreasing the models' performance. Furthermore, we find that such factorized parameterizations improve the signal-to-noise ratio of stochastic gradient estimates of the variational lower bound, resulting in faster convergence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题