论文标题
在有限空间上的基于投影的Wasserstein距离的推断
Inference for Projection-Based Wasserstein Distances on Finite Spaces
论文作者
论文摘要
Wasserstein距离是两个概率分布之间的距离,并且由于其具有吸引力的特性,最近在统计和机器学习中越来越受欢迎。扩展此距离的一种重要方法是使用低维分布的投影,以避免经验估计中的高计算成本和维数的诅咒,例如切成薄片的Wasserstein或Max-Sined Wasserstein距离。尽管在机器学习任务上取得了实际的成功,但由于缺乏分配限制结果,基于投影的瓦斯恒星距离的统计推断的可用性受到限制。在本文中,我们考虑通过在两个概率分布的低维投影之间整合或最大化Wasserstein距离定义的距离。然后,当两个分布在有限点上支持两个分布时,我们会得出有关这些距离的极限分布。我们还提出了一个引导程序,以估计数据中极限分布的分位数。这促进了这些距离的渐近确切的间隔估计和假设检验。我们的理论结果是基于Sommerfeld和Munk(2018)的参数,该论点是针对有限空间上的原始Wasserstein距离的分布限制以及非线性编程中的灵敏度分析理论的。最后,我们进行数值实验来说明理论结果,并证明我们推论方法对实际数据分析的适用性。
The Wasserstein distance is a distance between two probability distributions and has recently gained increasing popularity in statistics and machine learning, owing to its attractive properties. One important approach to extending this distance is using low-dimensional projections of distributions to avoid a high computational cost and the curse of dimensionality in empirical estimation, such as the sliced Wasserstein or max-sliced Wasserstein distances. Despite their practical success in machine learning tasks, the availability of statistical inferences for projection-based Wasserstein distances is limited owing to the lack of distributional limit results. In this paper, we consider distances defined by integrating or maximizing Wasserstein distances between low-dimensional projections of two probability distributions. Then we derive limit distributions regarding these distances when the two distributions are supported on finite points. We also propose a bootstrap procedure to estimate quantiles of limit distributions from data. This facilitates asymptotically exact interval estimation and hypothesis testing for these distances. Our theoretical results are based on the arguments of Sommerfeld and Munk (2018) for deriving distributional limits regarding the original Wasserstein distance on finite spaces and the theory of sensitivity analysis in nonlinear programming. Finally, we conduct numerical experiments to illustrate the theoretical results and demonstrate the applicability of our inferential methods to real data analysis.