基于卡方差异的最佳后验和基于KL-Divergence的最佳后代和交叉验证程序的比较

论文标题

基于卡方差异的最佳后验和基于KL-Divergence的最佳后代和交叉验证程序的比较

Optimal Posteriors for Chi-squared Divergence based PAC-Bayesian Bounds and Comparison with KL-divergence based Optimal Posteriors and Cross-Validation Procedure

论文作者

Sahu, Puja, Hemachandra, Nandyala

论文摘要

我们研究了最近引入的\ cite {begin2016pac} Chi-squared Divergence的最佳后代，其分布性的性质，计算的可扩展性和测试集性能。对于有限的分类器集，我们推断出三个距离函数的边界：KL-DiverGence，线性和平方距离。最佳的后重量与经验风险的偏差成正比，通常在子集支持的情况下。对于统一的先验而言，足以在这些风险订购的分类器子集中搜索后者。我们显示了线性距离作为凸面程序的结合最小化，并获得了其最佳后部的闭合形式表达式。而对于平方距离是一个在特定条件下的准凸线程序，而KL差异为非convex优化（凸功能的差异）。为了计算此类最佳后代，我们得出了快速收敛的固定点（FP）方程。我们将这些方法应用于有限的SVM正则化参数值集，以产生具有紧密边界的随机SVM。我们在各种UCI数据集上进行了最佳的后代和已知的基于KL-Diver的后代之间的全面性能比较。风险价值的不同范围和差异等。基于卡方的后代具有较弱的界限和较差的测试错误，在基于KL-diverress的基于基础的基于KL-diveriors的范围内提示。我们的研究强调了差异功能对Pac-Bayesian分类器的性能的影响。我们将随机分类器与基于交叉验证的确定性分类器进行比较。后者具有更好的测试错误，但是我们的样本更健壮，具有可量化的概括保证，并且在计算上的速度要快得多。

We investigate optimal posteriors for recently introduced \cite{begin2016pac} chi-squared divergence based PAC-Bayesian bounds in terms of nature of their distribution, scalability of computations, and test set performance. For a finite classifier set, we deduce bounds for three distance functions: KL-divergence, linear and squared distances. Optimal posterior weights are proportional to deviations of empirical risks, usually with subset support. For uniform prior, it is sufficient to search among posteriors on classifier subsets ordered by these risks. We show the bound minimization for linear distance as a convex program and obtain a closed-form expression for its optimal posterior. Whereas that for squared distance is a quasi-convex program under a specific condition, and the one for KL-divergence is non-convex optimization (a difference of convex functions). To compute such optimal posteriors, we derive fast converging fixed point (FP) equations. We apply these approaches to a finite set of SVM regularization parameter values to yield stochastic SVMs with tight bounds. We perform a comprehensive performance comparison between our optimal posteriors and known KL-divergence based posteriors on a variety of UCI datasets with varying ranges and variances in risk values, etc. Chi-squared divergence based posteriors have weaker bounds and worse test errors, hinting at an underlying regularization by KL-divergence based posteriors. Our study highlights the impact of divergence function on the performance of PAC-Bayesian classifiers. We compare our stochastic classifiers with cross-validation based deterministic classifier. The latter has better test errors, but ours is more sample robust, has quantifiable generalization guarantees, and is computationally much faster.

下载PDF全文

下载文献需遵守相关版权规定

论文标题