论文标题

Euler特性曲线和轮廓:大数据问题的稳定形状不变

Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems

论文作者

Dłotko, Paweł, Gurnari, Davide

论文摘要

拓扑数据分析的工具提供了稳定的摘要,封装了所考虑数据的形状。持续的同源性是最标准和研究的数据摘要,遭受了许多局限性;它的计算很难分发,很难推广到多透明,并且对于大数据集来说是计算上的过敏性。在本文中,我们研究了欧拉特征曲线的概念,用于一个参数过滤和欧拉特性曲线,用于多参数过滤。尽管在一个维度上是一个较弱的不变性,但我们表明基于欧拉的特征方法并没有持续的同源性障碍。我们展示了有效的算法以分布式方式计算它们,它们对多滤变的概括以及对大数据问题的实际适用性。此外,我们表明,Euler曲线和配置文件具有某些类型的稳定性,这使它们成为数据分析中的强大工具。最后,为了显示其实际适用性,考虑了多个用例。

Tools of Topological Data Analysis provide stable summaries encapsulating the shape of the considered data. Persistent homology, the most standard and well studied data summary, suffers a number of limitations; its computations are hard to distribute, it is hard to generalize to multifiltrations and is computationally prohibitive for big data-sets. In this paper we study the concept of Euler Characteristics Curves, for one parameter filtrations and Euler Characteristic Profiles, for multi-parameter filtrations. While being a weaker invariant in one dimension, we show that Euler Characteristic based approaches do not possess some handicaps of persistent homology; we show efficient algorithms to compute them in a distributed way, their generalization to multifiltrations and practical applicability for big data problems. In addition we show that the Euler Curves and Profiles enjoys certain type of stability which makes them robust tool in data analysis. Lastly, to show their practical applicability, multiple use-cases are considered.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源