论文标题

适应$ k $ -MEANS算法的异常值

Adapting $k$-means algorithms for outliers

论文作者

Grunau, Christoph, Rozhoň, Václav

论文摘要

本文展示了如何使用$ k $ - ameans问题的几种简单且经典的算法与异常值进行设置。 最近,Bhaskara等人。 (Neurips 2019)展示了如何将古典$ K $ -MEANS ++算法适应与异常值的设置。但是,他们的算法需要输出$ o(\ log(k)\ cdot z)$ outiers,其中$ z $是true Outliers的数量,以匹配$ o(\ log k)$ - $ k $ -means ++的近似保证。 In this paper, we build on their ideas and show how to adapt several sequential and distributed $k$-means algorithms to the setting with outliers, but with substantially stronger theoretical guarantees: our algorithms output $(1+\varepsilon)z$ outliers while achieving an $O(1 / \varepsilon)$-approximation to the objective function.在顺序世界中,我们通过改编Lattanzi和Sohler的最新算法来实现这一目标(ICML 2019)。在分布式设置中,我们适应了Guha等人的简单算法。 (IEEE Trans。知道和数据工程2003)以及Bahmani等人的流行$ K $ -MEANS $ \ | $。 (PVLDB 2012)。 我们的技术的理论应用是一种具有运行时间$ \ tilde {o}(nk^2/z)$的算法,它在输出$ o(z)$ outliers时,可以实现目标函数的$ O(1)$ - 近似值,假设$ o(z)$ outliers,则假设$ k \ ll z \ ll z \ ll z \ ll n $。这与Oracle模型中此问题的$ω(NK^2/z)$的匹配下限相互补。

This paper shows how to adapt several simple and classical sampling-based algorithms for the $k$-means problem to the setting with outliers. Recently, Bhaskara et al. (NeurIPS 2019) showed how to adapt the classical $k$-means++ algorithm to the setting with outliers. However, their algorithm needs to output $O(\log (k) \cdot z)$ outliers, where $z$ is the number of true outliers, to match the $O(\log k)$-approximation guarantee of $k$-means++. In this paper, we build on their ideas and show how to adapt several sequential and distributed $k$-means algorithms to the setting with outliers, but with substantially stronger theoretical guarantees: our algorithms output $(1+\varepsilon)z$ outliers while achieving an $O(1 / \varepsilon)$-approximation to the objective function. In the sequential world, we achieve this by adapting a recent algorithm of Lattanzi and Sohler (ICML 2019). In the distributed setting, we adapt a simple algorithm of Guha et al. (IEEE Trans. Know. and Data Engineering 2003) and the popular $k$-means$\|$ of Bahmani et al. (PVLDB 2012). A theoretical application of our techniques is an algorithm with running time $\tilde{O}(nk^2/z)$ that achieves an $O(1)$-approximation to the objective function while outputting $O(z)$ outliers, assuming $k \ll z \ll n$. This is complemented with a matching lower bound of $Ω(nk^2/z)$ for this problem in the oracle model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源