论文标题
使用动态量化来在线树回归器中进行拆分尝试
Using dynamical quantization to perform split attempts in online tree regressors
论文作者
论文摘要
在线决策树解决方案的一个主要方面是评估传入的数据并促进模型增长。因此,树木很多涉及不同种类的输入功能,并将其分开以从数据中学习。数值功能也不例外,与其他类型的功能相比,它们构成了其他挑战,因为没有琐碎的策略可以选择做出分裂决定的最佳点。问题在回归任务中更具挑战性,因为功能和目标都是连续的。典型的在线解决方案评估并存储分裂尝试之间监视的所有点,这与实时应用程序中构成的约束有关。在本文中,我们介绍了量化观察者(QO),这是一种简单但有效的基于哈希的算法,可在线树回归器中监视和评估候选者的分数候选者。 QO可以轻松地集成到增量决策树中,例如Hoeffding树,并且每个实例的监视成本为$ O(1)$,以及评估拆分候选者的次线性成本。以前的解决方案具有每个插入的$ O(\ log n)$成本(在最好的情况下),并具有线性成本来评估分分点。我们广泛的实验设置强调了QO在提供准确的分式建议方面的有效性,而与竞争对手相比,花费的记忆和处理时间要少得多。
A central aspect of online decision tree solutions is evaluating the incoming data and enabling model growth. For such, trees much deal with different kinds of input features and partition them to learn from the data. Numerical features are no exception, and they pose additional challenges compared to other kinds of features, as there is no trivial strategy to choose the best point to make a split decision. The problem is even more challenging in regression tasks because both the features and the target are continuous. Typical online solutions evaluate and store all the points monitored between split attempts, which goes against the constraints posed in real-time applications. In this paper, we introduce the Quantization Observer (QO), a simple yet effective hashing-based algorithm to monitor and evaluate split point candidates in numerical features for online tree regressors. QO can be easily integrated into incremental decision trees, such as Hoeffding Trees, and it has a monitoring cost of $O(1)$ per instance and sub-linear cost to evaluate split candidates. Previous solutions had a $O(\log n)$ cost per insertion (in the best case) and a linear cost to evaluate split points. Our extensive experimental setup highlights QO's effectiveness in providing accurate split point suggestions while spending much less memory and processing time than its competitors.