论文标题
模糊C均值持续图的聚类
Fuzzy c-Means Clustering for Persistence Diagrams
论文作者
论文摘要
持久图简明地代表了点云的拓扑,同时具有强大的理论保证,但是如何最好地将这些信息最佳地整合到机器学习工作流程中的问题仍然开放。在本文中,我们将无处不在的模糊c均值(FCM)聚类算法扩展到持久图的空间,从而使无监督的学习能够自动捕获数据的拓扑结构,而无需拓扑知识或许多其他技术所需的持久性图表。我们给出理论上的融合可以确保与欧几里得案例相对应,并从经验上证明了我们的算法通过模糊兰德指数捕获拓扑信息的能力。我们在两个数据集上进行了实验,这些数据集利用了我们算法的拓扑和模糊性质:机器学习中的预训练模型选择和材料科学的晶格结构。由于预训练的模型可以在多个任务上表现良好,因此选择最佳模型是一个自然模糊的问题。我们表明,模糊的聚类持续图可以使用决策边界的拓扑结构进行模型选择。在材料科学中,我们首次对转换的晶格结构数据集进行了分类,而概率会员价值则使我们在进一步调查需要昂贵的实验室时间和专业知识的情况下对候选晶格进行排名。
Persistence diagrams concisely represent the topology of a point cloud whilst having strong theoretical guarantees, but the question of how to best integrate this information into machine learning workflows remains open. In this paper we extend the ubiquitous Fuzzy c-Means (FCM) clustering algorithm to the space of persistence diagrams, enabling unsupervised learning that automatically captures the topological structure of data without the topological prior knowledge or additional processing of persistence diagrams that many other techniques require. We give theoretical convergence guarantees that correspond to the Euclidean case, and empirically demonstrate the capability of our algorithm to capture topological information via the fuzzy RAND index. We end with experiments on two datasets that utilise both the topological and fuzzy nature of our algorithm: pre-trained model selection in machine learning and lattices structures from materials science. As pre-trained models can perform well on multiple tasks, selecting the best model is a naturally fuzzy problem; we show that fuzzy clustering persistence diagrams allows for model selection using the topology of decision boundaries. In materials science, we classify transformed lattice structure datasets for the first time, whilst the probabilistic membership values let us rank candidate lattices in a scenario where further investigation requires expensive laboratory time and expertise.