论文标题
可扩展的近似推断和某些应用
Scalable Approximate Inference and Some Applications
论文作者
论文摘要
概率模型的大致推断是机器学习的基本任务。近似推论为贝叶斯推理,决策和贝叶斯深度学习提供了强大的工具。主要目标是估计W.R.T.感兴趣的功能的期望。目标分布。当涉及高维概率模型和大型数据集时,有效的近似推断变得至关重要。在本文中,我们提出了一个新的近似推论框架,该框架结合了这三个框架的优势并克服了它们的局限性。我们提出的四种算法是由Stein方法的最新计算进步激励。当目标分布的梯度信息可用或不可用时,我们提出的算法将应用于设置下的连续和离散分布。提供了理论分析以证明我们提出的算法的收敛性。我们的适应性是迭代算法通过功能降低更新的提案和目标之间的KL差异来改善重要性建议。当目标的梯度不可用时,我们提出的采样算法利用了替代模型的梯度,并纠正了具有重要性权重的诱导偏置,这显着优于其他无梯度抽样算法。此外,我们的理论结果使我们能够对离散分布进行拟合优度测试。在论文的结尾,我们提出了一种重要的加权方法,可以通过一声通信有效地汇总分布式学习中的本地模型。模拟和实际数据集的结果表明了我们算法的统计效率和广泛的适用性。
Approximate inference in probability models is a fundamental task in machine learning. Approximate inference provides powerful tools to Bayesian reasoning, decision making, and Bayesian deep learning. The main goal is to estimate the expectation of interested functions w.r.t. a target distribution. When it comes to high dimensional probability models and large datasets, efficient approximate inference becomes critically important. In this thesis, we propose a new framework for approximate inference, which combines the advantages of these three frameworks and overcomes their limitations. Our proposed four algorithms are motivated by the recent computational progress of Stein's method. Our proposed algorithms are applied to continuous and discrete distributions under the setting when the gradient information of the target distribution is available or unavailable. Theoretical analysis is provided to prove the convergence of our proposed algorithms. Our adaptive IS algorithm iteratively improves the importance proposal by functionally decreasing the KL divergence between the updated proposal and the target. When the gradient of the target is unavailable, our proposed sampling algorithm leverages the gradient of a surrogate model and corrects induced bias with importance weights, which significantly outperforms other gradient-free sampling algorithms. In addition, our theoretical results enable us to perform the goodness-of-fit test on discrete distributions. At the end of the thesis, we propose an importance-weighted method to efficiently aggregate local models in distributed learning with one-shot communication. Results on simulated and real datasets indicate the statistical efficiency and wide applicability of our algorithm.