论文标题
一种新的非参数间区距离措施,用于评估聚类
A new nonparametric interpoint distance-based measure for assessment of clustering
论文作者
论文摘要
提出了一种新的基于点距离的度量,以确定数据集中存在的最佳簇数。它以非参数方法设计,与给定数据的分布无关。数据成员之间的距离距离使我们的群集有效性指数适用于单变量和以任意尺度测量的多变量数据,或者在任何维数空间中进行观察,其中研究变量的数量甚至可以大于样本量。我们提出的标准与任何聚类算法都兼容,可用于确定未知数的簇数或评估数据集中所得簇的质量。通过合成和现实生活数据进行演示,确立了其优于文献的众所周知的聚类精度度量。
A new interpoint distance-based measure is proposed to identify the optimal number of clusters present in a data set. Designed in nonparametric approach, it is independent of the distribution of given data. Interpoint distances between the data members make our cluster validity index applicable to univariate and multivariate data measured on arbitrary scales, or having observations in any dimensional space where the number of study variables can be even larger than the sample size. Our proposed criterion is compatible with any clustering algorithm, and can be used to determine the unknown number of clusters or to assess the quality of the resulting clusters for a data set. Demonstration through synthetic and real-life data establishes its superiority over the well-known clustering accuracy measures of the literature.