论文标题
具有特征加权平均移位算法的高维数据的自动聚类
Automated Clustering of High-dimensional Data with a Feature Weighted Mean Shift Algorithm
论文作者
论文摘要
平均移位是一个简单的交互过程,它逐渐将数据点转移到表示该区域中数据点最高密度的模式。平均换档算法已有效地用于数据降级,寻求模式并以自动方式找到数据集中的簇数。但是,随着数据维度的增加,均值移位的优点迅速消失,并且只有少数功能包含有关数据群集结构的有用信息。我们提出了一种简单而优雅的特征加权的均值转移变体,以有效地了解该特征的重要性,从而将平均转移的优点扩展到高维数据。所得算法不仅胜过常规的平均偏移聚类过程,而且还保留了其计算简单性。此外,提出的方法具有严格的理论收敛保证和至少立方体顺序的收敛速率。通过与基线和最先进的聚类方法的合成和现实数据集的实验比较,可以对我们的提案的功效进行彻底评估。
Mean shift is a simple interactive procedure that gradually shifts data points towards the mode which denotes the highest density of data points in the region. Mean shift algorithms have been effectively used for data denoising, mode seeking, and finding the number of clusters in a dataset in an automated fashion. However, the merits of mean shift quickly fade away as the data dimensions increase and only a handful of features contain useful information about the cluster structure of the data. We propose a simple yet elegant feature-weighted variant of mean shift to efficiently learn the feature importance and thus, extending the merits of mean shift to high-dimensional data. The resulting algorithm not only outperforms the conventional mean shift clustering procedure but also preserves its computational simplicity. In addition, the proposed method comes with rigorous theoretical convergence guarantees and a convergence rate of at least a cubic order. The efficacy of our proposal is thoroughly assessed through experimental comparison against baseline and state-of-the-art clustering methods on synthetic as well as real-world datasets.