论文标题
通过将评论家的假设空间限制为RKHS,从而降低了共同信息的变异估计值
Reducing the Variance of Variational Estimates of Mutual Information by Limiting the Critic's Hypothesis Space to RKHS
论文作者
论文摘要
相互信息(MI)是两个随机变量之间依赖关系的信息理论。从文献中提出了几种估计MI的方法,这些方法是从两个随机变量的样品中,具有未知的潜在概率分布。最近的方法将参数概率分布或批评家视为神经网络,以近似未知的密度比。近似密度比用于估计MI的不同变异下限。尽管当真MI较低时,这些方法提供了可靠的估计,但在高MI的情况下它们会产生较高的方差估计。我们认为,高方差特征是由于评论家假设空间的不受控制的复杂性。为了支持这一参数,我们使用与评论家架构相关的假设空间的数据驱动的Rademacher复杂性来分析MI变化下限估计值的概括误差结合。在拟议的工作中,我们表明,可以通过将评论家的假设空间限制为复制希尔伯特内核空间(RKHS)来否定这些估计量的高方差特征,后者对应于使用自动光谱内核学习(AskL)学习的内核。通过分析上述概括误差界,我们通过有效的正则化项来增强整体优化目标。我们从经验上证明了这种正则化在四个变异下限(即NWJ,Mine,JS和Smile)上实施适当的偏差方差折衷方面的功效。
Mutual information (MI) is an information-theoretic measure of dependency between two random variables. Several methods to estimate MI, from samples of two random variables with unknown underlying probability distributions have been proposed in the literature. Recent methods realize parametric probability distributions or critic as a neural network to approximate unknown density ratios. The approximated density ratios are used to estimate different variational lower bounds of MI. While these methods provide reliable estimation when the true MI is low, they produce high variance estimates in cases of high MI. We argue that the high variance characteristic is due to the uncontrolled complexity of the critic's hypothesis space. In support of this argument, we use the data-driven Rademacher complexity of the hypothesis space associated with the critic's architecture to analyse generalization error bound of variational lower bound estimates of MI. In the proposed work, we show that it is possible to negate the high variance characteristics of these estimators by constraining the critic's hypothesis space to Reproducing Hilbert Kernel Space (RKHS), which corresponds to a kernel learned using Automated Spectral Kernel Learning (ASKL). By analysing the aforementioned generalization error bounds, we augment the overall optimisation objective with effective regularisation term. We empirically demonstrate the efficacy of this regularization in enforcing proper bias variance tradeoff on four variational lower bounds, namely NWJ, MINE, JS and SMILE.