论文标题
通过特征空间序列发现和分析聚类
Clustering through Feature Space Sequence Discovery and Analysis
论文作者
论文摘要
在没有先验知识的情况下识别高维数据模式是数据科学的重要任务。本文提出了一种简单有效的noparametric算法:数据转换为序列分析,DCSA,该算法在不重复的情况下动态探索特征空间中的每个点,并将找到定向的汉密尔顿路径。基于变更点分析理论,将与该路径相对应的序列切成几个片段以实现聚类。来自不同字段的实际数据集的实验,尺寸为4到20531,证实了这项工作中的方法是强大的,并且在结果分析中具有可解释性。
Identifying high-dimensional data patterns without a priori knowledge is an important task of data science. This paper proposes a simple and efficient noparametric algorithm: Data Convert to Sequence Analysis, DCSA, which dynamically explore each point in the feature space without repetition, and a Directed Hamilton Path will be found. Based on the change point analysis theory, The sequence corresponding to the path is cut into several fragments to achieve clustering. The experiments on real-world datasets from different fields with dimensions ranging from 4 to 20531 confirm that the method in this work is robust and has visual interpretability in result analysis.