论文标题
数据驱动的线性复杂性低级别的一般内核矩阵:几何方法
Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach
论文作者
论文摘要
一般,{\ em矩形}内核矩阵可以定义为$ k_ {ij} =κ(x_i,y_j)$,其中$κ(x,y)$是一个内核功能,其中$ x = \ {x_i \}是两组。在本文中,我们寻求低排名的近似值,以$ x $和$ y $的点集很大,并且是任意分布的,例如彼此之间的分布,``相同'',相同的等等。在这种情况下,这些点通常是高维的。由于该点集很大,因此我们必须利用矩阵源于内核函数并避免形成矩阵,从而排除大多数代数技术的事实。特别是,我们寻求可以相对于固定近似等级的数据大小进行线性或几乎线性扩展的方法。本文中的主要思想是{\ em几何}选择适当的点子集以构建低级近似值。本文中的分析指导了如何执行此选择。
A general, {\em rectangular} kernel matrix may be defined as $K_{ij} = κ(x_i,y_j)$ where $κ(x,y)$ is a kernel function and where $X=\{x_i\}_{i=1}^m$ and $Y=\{y_i\}_{i=1}^n$ are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large and are arbitrarily distributed, such as away from each other, ``intermingled'', identical, etc. Such rectangular kernel matrices may arise, for example, in Gaussian process regression where $X$ corresponds to the training data and $Y$ corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly or nearly linear with respect to the size of data for a fixed approximation rank. The main idea in this paper is to {\em geometrically} select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.