通过张量分解改善非参数密度估计

论文标题

通过张量分解改善非参数密度估计

Improving Nonparametric Density Estimation with Tensor Decompositions

论文作者

Vandermeulen, Robert A.

论文摘要

尽管非参数密度估计器通常在低维数据上表现良好，但由于维度的诅咒，它们的性能可能会遭受更高的尺寸数据的影响。避免这种情况的一种技术是假设特征之间不依赖性，并且数据是从可分离密度中采样的。这允许一个人独立估计每个边际分布，从而避免了与估计全关节密度相关的缓慢速率。这是一种幼稚贝叶斯模型中采用的策略，类似于估计排名张量的张量。在本文中，我们调查了这些改进是否可以扩展到其他简化的依赖性假设，我们通过非负张量分解进行建模。在我们的中央理论结果中，我们证明将估计限制为低级别的非负PARAFAC或TUCKER分解可消除多维直方图的bin宽度速率的维数指数。通过将现有的非负张量分解在直方图估计器中直接应用到直接应用到直接的统计显着性，通过实验验证了这些结果。

While nonparametric density estimators often perform well on low dimensional data, their performance can suffer when applied to higher dimensional data, owing presumably to the curse of dimensionality. One technique for avoiding this is to assume no dependence between features and that the data are sampled from a separable density. This allows one to estimate each marginal distribution independently thereby avoiding the slow rates associated with estimating the full joint density. This is a strategy employed in naive Bayes models and is analogous to estimating a rank-one tensor. In this paper we investigate whether these improvements can be extended to other simplified dependence assumptions which we model via nonnegative tensor decompositions. In our central theoretical results we prove that restricting estimation to low-rank nonnegative PARAFAC or Tucker decompositions removes the dimensionality exponent on bin width rates for multidimensional histograms. These results are validated experimentally with high statistical significance via direct application of existing nonnegative tensor factorization to histogram estimators.

下载PDF全文

下载文献需遵守相关版权规定

论文标题