论文标题
非对称确定点过程的可扩展MCMC抽样
Scalable MCMC Sampling for Nonsymmetric Determinantal Point Processes
论文作者
论文摘要
确定点过程(DPP)是一个优雅的模型,可以为$ n $项目集合的每个子集分配概率。虽然传统上,DPP通过对称内核矩阵进行了参数化,从而消除了这种对称约束,从而导致非对称DPP(NDPP),从而导致建模功率和预测性能的显着改善。最近的工作研究了Markov Chain Monte Carlo(MCMC)对NDPPS的采样算法,该算法仅限于尺寸为$ K $ subset(称为$ K $ -NDPPS)。但是,这种方法的运行时间在$ n $中是二次的,这对于大规模设置而言是不可行的。在这项工作中,我们为$ k $ -ndpps提供了一个可扩展的MCMC采样算法,并具有低级别内核,从而使运行时可以在$ n $中获得sublinear。我们的方法基于最先进的NDPP排斥采样算法,我们通过一种新型的方法来有效地构建建议分布。此外,我们将可扩展的$ K $ -NDPP采样算法扩展到没有大小约束的情况下。我们由此产生的采样方法在内核等级中具有多项式时间的复杂性,而现有方法的运行时间在等级中为指数。通过对现实世界数据集进行的理论分析和实验,我们验证我们的可扩展近似采样算法比现有的$ k $ -ndpps和ndpps的现有采样方法快的阶数。
A determinantal point process (DPP) is an elegant model that assigns a probability to every subset of a collection of $n$ items. While conventionally a DPP is parameterized by a symmetric kernel matrix, removing this symmetry constraint, resulting in nonsymmetric DPPs (NDPPs), leads to significant improvements in modeling power and predictive performance. Recent work has studied an approximate Markov chain Monte Carlo (MCMC) sampling algorithm for NDPPs restricted to size-$k$ subsets (called $k$-NDPPs). However, the runtime of this approach is quadratic in $n$, making it infeasible for large-scale settings. In this work, we develop a scalable MCMC sampling algorithm for $k$-NDPPs with low-rank kernels, thus enabling runtime that is sublinear in $n$. Our method is based on a state-of-the-art NDPP rejection sampling algorithm, which we enhance with a novel approach for efficiently constructing the proposal distribution. Furthermore, we extend our scalable $k$-NDPP sampling algorithm to NDPPs without size constraints. Our resulting sampling method has polynomial time complexity in the rank of the kernel, while the existing approach has runtime that is exponential in the rank. With both a theoretical analysis and experiments on real-world datasets, we verify that our scalable approximate sampling algorithms are orders of magnitude faster than existing sampling approaches for $k$-NDPPs and NDPPs.