使用随机递归树的分类的荟萃分类：高能量物理应用

论文标题

使用随机递归树的分类的荟萃分类：高能量物理应用

A meta-algorithm for classification using random recursive tree ensembles: A high energy physics application

论文作者

Lalchand, Vidhi

论文摘要

这项工作的目的是在存在离散二进制类别的情况下提出一个用于自动分类的元叠加。在存在重叠的类分布的情况下，分类器学习是机器学习的一个挑战性问题。重叠类是通过在特征空间中存在模棱两可的区域来描述的，其点高密度属于这两个类别。这通常发生在现实世界数据集中，一个示例是数字数据，表示源自高能加速器等高能加速器（例如大型强子撞机（LHC））的粒子衰变的性质。针对班级重叠问题的大量研究使用集合分类器，通过在多个阶段使用迭代或在输入训练数据的不同子集上使用同一模型的多个副本来提高算法的性能。前者称为Boosting，后者称为行李。本论文中提出的算法针对高能量物理学中的具有挑战性的分类问题，即提高了希格斯发现的统计意义。用于训练算法的基础数据集是由官方Atlas全探测器仿真构建的实验数据，其中包括HIGGS事件（信号）与不同的背景事件（背景）（背景）混合在一起，这些事件（背景）紧密模仿信号生成类重叠的统计属性。提出的算法是经典促进决策树的一种变体，该变体已知是实验物理学中最成功的分析技术之一。该算法利用一个结合了两种元学习技术的统一框架 - 包装和提升。结果表明，这种组合仅在基础学习者中有一个随机技巧的情况下起作用。

The aim of this work is to propose a meta-algorithm for automatic classification in the presence of discrete binary classes. Classifier learning in the presence of overlapping class distributions is a challenging problem in machine learning. Overlapping classes are described by the presence of ambiguous areas in the feature space with a high density of points belonging to both classes. This often occurs in real-world datasets, one such example is numeric data denoting properties of particle decays derived from high-energy accelerators like the Large Hadron Collider (LHC). A significant body of research targeting the class overlap problem use ensemble classifiers to boost the performance of algorithms by using them iteratively in multiple stages or using multiple copies of the same model on different subsets of the input training data. The former is called boosting and the latter is called bagging. The algorithm proposed in this thesis targets a challenging classification problem in high energy physics - that of improving the statistical significance of the Higgs discovery. The underlying dataset used to train the algorithm is experimental data built from the official ATLAS full-detector simulation with Higgs events (signal) mixed with different background events (background) that closely mimic the statistical properties of the signal generating class overlap. The algorithm proposed is a variant of the classical boosted decision tree which is known to be one of the most successful analysis techniques in experimental physics. The algorithm utilizes a unified framework that combines two meta-learning techniques - bagging and boosting. The results show that this combination only works in the presence of a randomization trick in the base learners.

下载PDF全文

下载文献需遵守相关版权规定

论文标题