论文标题
通用指标的一致k群集
Consistent k-Clustering for General Metrics
论文作者
论文摘要
给定度量空间中的点流,是否可以通过在整个算法执行过程中仅更改群集中心来维持恒定的近似聚类?近年来,这个问题在机器学习文献中受到了关注,在我们工作之前,最著名的算法执行$ \ widetilde {o}(k^2)$中心掉期($ \ widetilde {o}(\ cdot)$注释隐藏了polygarithmic因子$ n $ n $ n $ n $ n $ nidea Hives $ n $ and the factio $Δ$δ与离线情况相比,这是二次增加的增长 - 整个流是事先知道的,并且有兴趣在任何时间点保持恒定近似值 - $ \ \\\hdetilde {o}(k)$交换已知足够简单的例子表明$ω(k \ log log(k \ log(nΔ))$ swaps $ swaps是必要的。我们通过开发一种算法来缩小这一差距,该算法可能令人惊讶地与离线环境中的保证相匹配。具体来说,我们通过执行最佳(到polygarithimimic因子)数字$ \ widetilde {o}(k)$来维持$ k $ - 媒体问题的恒定因素近似。为了获得我们的结果,我们利用$ K $ -Median聚类的新结构属性可能具有独立的兴趣。
Given a stream of points in a metric space, is it possible to maintain a constant approximate clustering by changing the cluster centers only a small number of times during the entire execution of the algorithm? This question received attention in recent years in the machine learning literature and, before our work, the best known algorithm performs $\widetilde{O}(k^2)$ center swaps (the $\widetilde{O}(\cdot)$ notation hides polylogarithmic factors in the number of points $n$ and the aspect ratio $Δ$ of the input instance). This is a quadratic increase compared to the offline case -- the whole stream is known in advance and one is interested in keeping a constant approximation at any point in time -- for which $\widetilde{O}(k)$ swaps are known to be sufficient and simple examples show that $Ω(k \log(n Δ))$ swaps are necessary. We close this gap by developing an algorithm that, perhaps surprisingly, matches the guarantees in the offline setting. Specifically, we show how to maintain a constant-factor approximation for the $k$-median problem by performing an optimal (up to polylogarithimic factors) number $\widetilde{O}(k)$ of center swaps. To obtain our result we leverage new structural properties of $k$-median clustering that may be of independent interest.