论文标题

在排除少量图中及以后的聚类的核心

Coresets for Clustering in Excluded-minor Graphs and Beyond

论文作者

Braverman, Vladimir, Jiang, Shaofeng H. -C., Krauthgamer, Robert, Wu, Xuan

论文摘要

核心是现代数据还原工具,可在数据分析中广泛使用,以提高运行时间,空间和通信复杂性方面的效率。我们的主要结果是一种快速算法,用于在(最短路径指标)中构建一个小核心图。具体来说,我们给出了第一个大小的核心,仅取决于$ k $,$ε$和排除的少量大小,我们的运行时间是准线性(以输入图的大小)。 我们新算法中的主要创新是迭代的。它首先将$ n $输入点减少到大约$ o(\ log n)$重新加权点,然后将$ o(\ log \ log n)$等等,依此类推,直到大小独立于$ n $。这种迭代尺寸降低的每个步骤均基于Feldman和Langberg的重要性采样框架(STOC 2011),具有至关重要的适应性,可以通过使用终端嵌入来减少\ emph {不同的点}的数量(仅在每个末端到所有其他点,都可以保证低畸变。我们的终端嵌入在技术上涉及,并依赖于最短路径分离器,这是平面和排除少量图的标准工具。 此外,我们的新算法也仅使用Narayanan和Nelson的最终嵌入结果,即扩展了Johnson-Lindenstrauss Lemma,也仅适用于欧几里得指标。因此,我们在高维欧几里德空间中获得了有效的核心结构,从而匹配并简化了最新结果(Sohler and Woodruff,Focs,2018年; Huang和Vishnoi,STOC 2020)。 此外,我们还采用了带有添加剂失真的末端嵌入,以在具有界限的高速公路尺寸的图中获得小核,并使用核心集的应用来获得改进的近似方案,例如,通过新的centroid集合,将平面K-Median的改进的PTA进行。

Coresets are modern data-reduction tools that are widely used in data analysis to improve efficiency in terms of running time, space and communication complexity. Our main result is a fast algorithm to construct a small coreset for k-Median in (the shortest-path metric of) an excluded-minor graph. Specifically, we give the first coreset of size that depends only on $k$, $ε$ and the excluded-minor size, and our running time is quasi-linear (in the size of the input graph). The main innovation in our new algorithm is that is iterative; it first reduces the $n$ input points to roughly $O(\log n)$ reweighted points, then to $O(\log\log n)$, and so forth until the size is independent of $n$. Each step in this iterative size reduction is based on the importance sampling framework of Feldman and Langberg (STOC 2011), with a crucial adaptation that reduces the number of \emph{distinct points}, by employing a terminal embedding (where low distortion is guaranteed only for the distance from every terminal to all other points). Our terminal embedding is technically involved and relies on shortest-path separators, a standard tool in planar and excluded-minor graphs. Furthermore, our new algorithm is applicable also in Euclidean metrics, by simply using a recent terminal embedding result of Narayanan and Nelson, (STOC 2019), which extends the Johnson-Lindenstrauss Lemma. We thus obtain an efficient coreset construction in high-dimensional Euclidean spaces, thereby matching and simplifying state-of-the-art results (Sohler and Woodruff, FOCS 2018; Huang and Vishnoi, STOC 2020). In addition, we also employ terminal embedding with additive distortion to obtain small coresets in graphs with bounded highway dimension, and use applications of our coresets to obtain improved approximation schemes, e.g., an improved PTAS for planar k-Median via a new centroid set.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源