论文标题

对大数据中异常值检测的隔离树方法的数学评估

A Mathematical Assessment of the Isolation Tree Method for Outliers Detection in Big Data

论文作者

Morales, Fernando A., Ramírez, Jorge M., Ramos, Edgar A.

论文摘要

在本文中,介绍了用于异常检测的隔离随机森林方法(IRF方法)的数学分析。我们表明,IRF空间可以由隔离树算法(ITREE)引起的概率赋予。在这种情况下,使用大量定律证明了IRF方法的收敛性。提出了几个反样本,以表明原始方法是尚无定论的,并且在使用它作为检测异常的手段时,无法给出质量证书。因此,提出了IRF的替代版本,其数学基础及其局限性是完全合理的。最后,提出了数值实验,以将经典IRF的性能与拟议的实验进行比较。

In this paper, the mathematical analysis of the Isolation Random Forest Method (IRF Method) for anomaly detection is presented. We show that the IRF space can be endowed with a probability induced by the Isolation Tree algorithm (iTree). In this setting, the convergence of the IRF method is proved using the Law of Large Numbers. A couple of counterexamples are presented to show that the original method is inconclusive and no quality certificate can be given, when using it as a means to detect anomalies. Hence, an alternative version of IRF is proposed, whose mathematical foundation, as well as its limitations, are fully justified. Finally, numerical experiments are presented to compare the performance of the classic IRF with the proposed one.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源