论文标题

多样性采样是内核方法的隐式正规化

Diversity sampling is an implicit regularization for kernel methods

论文作者

Fanuel, Michaël, Schreurs, Joachim, Suykens, Johan A. K.

论文摘要

通过使用NyStröm方法和预处理技术,内核方法在大规模回归和分类问题上实现了很好的性能。 NyStröm近似基于地标的子集 - 给出了核基质的级别近似值,并且已知可以提供一种隐式正则化形式。我们进一步详细阐述了采样不同地标在监督和无监督的内核方法中构建NyStröm近似的影响。通过使用确定点过程进行抽样,我们获得了有关多样性和正则化之间相互作用的其他理论结果。从经验上讲,我们证明了基于由不同点的子集训练内核方法的优势。特别是,如果数据集具有密集的批量和更稀疏的尾巴,我们表明,与统一的地标采样相比,NyStröm内核回归带有不同地标的nyström内核回归提高了数据集的稀疏区域的回归准确性。当精确的DPP采样实际上不可行时,还提出了一种贪婪的启发式方法,以选择大型数据集中大小的不同样本。

Kernel methods have achieved very good performance on large scale regression and classification problems, by using the Nyström method and preconditioning techniques. The Nyström approximation -- based on a subset of landmarks -- gives a low rank approximation of the kernel matrix, and is known to provide a form of implicit regularization. We further elaborate on the impact of sampling diverse landmarks for constructing the Nyström approximation in supervised as well as unsupervised kernel methods. By using Determinantal Point Processes for sampling, we obtain additional theoretical results concerning the interplay between diversity and regularization. Empirically, we demonstrate the advantages of training kernel methods based on subsets made of diverse points. In particular, if the dataset has a dense bulk and a sparser tail, we show that Nyström kernel regression with diverse landmarks increases the accuracy of the regression in sparser regions of the dataset, with respect to a uniform landmark sampling. A greedy heuristic is also proposed to select diverse samples of significant size within large datasets when exact DPP sampling is not practically feasible.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源