使用Vecchia近似值的可扩展高斯过程回归和可变选择

论文标题

使用Vecchia近似值的可扩展高斯过程回归和可变选择

Scalable Gaussian-process regression and variable selection using Vecchia approximations

论文作者

Cao, Jian, Guinness, Joseph, Genton, Marc G., Katzfuss, Matthias

论文摘要

高斯过程（GP）回归是一种灵活的，非参数的回归方法，自然量化了不确定性。在许多应用中，响应和协变量的数量都大，目标是选择与响应相关的协变量。在这种情况下，我们提出了一种新颖的可扩展算法，即创建的VGPR，该算法基于Vecchia GP近似，优化了受惩罚的GP对数类样本，这是空间统计的有序条件近似，这意味着精确矩阵的稀疏cholesky因子。我们将正则路径从强度惩罚到弱惩罚，依次添加基于对数似然梯度的候选协变量，并通过新的二次约束坐标下降算法取消了无关的协变量。我们提出了基于Vecchia的迷你批次亚采样，该子采样提供了无偏梯度估计器。所得过程可扩展到数百万个响应和数千个协变量。理论分析和数值研究表明，相对于现有方法，可扩展性和准确性的提高。

Gaussian process (GP) regression is a flexible, nonparametric approach to regression that naturally quantifies uncertainty. In many applications, the number of responses and covariates are both large, and a goal is to select covariates that are related to the response. For this setting, we propose a novel, scalable algorithm, coined VGPR, which optimizes a penalized GP log-likelihood based on the Vecchia GP approximation, an ordered conditional approximation from spatial statistics that implies a sparse Cholesky factor of the precision matrix. We traverse the regularization path from strong to weak penalization, sequentially adding candidate covariates based on the gradient of the log-likelihood and deselecting irrelevant covariates via a new quadratic constrained coordinate descent algorithm. We propose Vecchia-based mini-batch subsampling, which provides unbiased gradient estimators. The resulting procedure is scalable to millions of responses and thousands of covariates. Theoretical analysis and numerical studies demonstrate the improved scalability and accuracy relative to existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题