最大熵与最大似然竞争

论文标题

最大熵与最大似然竞争

Maximum Entropy competes with Maximum Likelihood

论文作者

Allahverdyan, A. E., Martirosyan, N. H.

论文摘要

最大熵（Maxent）方法在理论和应用机器学习中具有大量应用，因为它提供了一种方便的非参数工具来估计未知概率。该方法是统计物理学对概率推断的主要贡献。但是，目前缺少针对其有效性限制的系统方法。在这里，我们在贝叶斯决策理论设置中研究了Maxent，即假设存在明确定义的未知概率的Dirichlet密度，并且可以使用平均Kullback-Leibler（KL）距离来确定各种估计器的质量和适用性。这些允许评估各种最大约束的相关性，检查其一般适用性，并将Maxent与具有不同程度的对先验的估计量进行比较。正则最大可能性（ML）和贝叶斯估计器。我们表明，Maxent应用于稀疏的数据制度，但需要特定类型的先验信息。特别是，只要估计的随机数量及其概率之间存在先前的等级相关性，Maxent可以胜过最佳正规ML。

Maximum entropy (MAXENT) method has a large number of applications in theoretical and applied machine learning, since it provides a convenient non-parametric tool for estimating unknown probabilities. The method is a major contribution of statistical physics to probabilistic inference. However, a systematic approach towards its validity limits is currently missing. Here we study MAXENT in a Bayesian decision theory set-up, i.e. assuming that there exists a well-defined prior Dirichlet density for unknown probabilities, and that the average Kullback-Leibler (KL) distance can be employed for deciding on the quality and applicability of various estimators. These allow to evaluate the relevance of various MAXENT constraints, check its general applicability, and compare MAXENT with estimators having various degrees of dependence on the prior, viz. the regularized maximum likelihood (ML) and the Bayesian estimators. We show that MAXENT applies in sparse data regimes, but needs specific types of prior information. In particular, MAXENT can outperform the optimally regularized ML provided that there are prior rank correlations between the estimated random quantity and its probabilities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题