论文标题
基于内核密度估计的聚类评估的新措施
A new measure for assessment of clustering based on kernel density estimation
论文作者
论文摘要
提出了一种新的聚类精度度量,以确定簇数量未知数,并评估在任何维空间中给出的数据集的聚类质量。我们的有效性指数将经典的非参数单变量内核密度估计方法应用于数据成员之间计算的点距离。基于距离距离,它不含维数的诅咒,因此对于高维情况的高维情况可以有效地计算,在这些情况下,研究变量的数量可能大于样本量。所提出的度量与任何聚类算法以及各种数据集兼容,其中可以将接口距离度量定义为具有密度函数。仿真研究证明了其优于广泛使用的集群有效性指数,例如平均轮廓宽度和Dunn指数,而其适用性则相对于Alon数据集的高维生物统计研究表明,并且具有新变量恒星的大量天线级别的时间序列。
A new clustering accuracy measure is proposed to determine the unknown number of clusters and to assess the quality of clustering of a data set given in any dimensional space. Our validity index applies the classical nonparametric univariate kernel density estimation method to the interpoint distances computed between the members of data. Being based on interpoint distances, it is free of the curse of dimensionality and therefore efficiently computable for high-dimensional situations where the number of study variables can be larger than the sample size. The proposed measure is compatible with any clustering algorithm and with every kind of data set where the interpoint distance measure can be defined to have a density function. Simulation study proves its superiority over widely used cluster validity indices like the average silhouette width and the Dunn index, whereas its applicability is shown with respect to a high-dimensional Biostatistical study of Alon data set and a large Astrostatistical application of time series with light curves of new variable stars.