通过规则提取可解释的随机森林

论文标题

通过规则提取可解释的随机森林

Interpretable Random Forests via Rule Extraction

论文作者

Bénard, Clément, Biau, Gérard, da Veiga, Sébastien, Scornet, Erwan

论文摘要

我们为回归介绍了Sirus（稳定且可解释的规则集），这是一种稳定的规则学习算法，以简短而简单的规则列表的形式。最先进的学习算法通常被称为“黑匣子”，因为其预测过程中涉及大量操作。尽管具有强大的预测性，但这种缺乏可解释性可能对具有关键决策的应用程序具有很高的限制。另一方面，具有简单结构的决策树，规则算法或稀疏线性模型的算法 - 以其不稳定性而闻名。这一不良功能使数据分析的结论不可靠，结果是一个强大的操作限制。这激发了Sirus的设计，Sirus的设计结合了一个简单的结构，并在数据扰动时具有显着的稳定行为。该算法基于随机森林，其预测准确性保留了。我们通过经验（通过实验）和理论（以其渐近稳定性的证明）来证明该方法的效率。我们的R/C ++软件实现Sirus可从Cran获得。

We introduce SIRUS (Stable and Interpretable RUle Set) for regression, a stable rule learning algorithm which takes the form of a short and simple list of rules. State-of-the-art learning algorithms are often referred to as "black boxes" because of the high number of operations involved in their prediction process. Despite their powerful predictivity, this lack of interpretability may be highly restrictive for applications with critical decisions at stake. On the other hand, algorithms with a simple structure-typically decision trees, rule algorithms, or sparse linear models-are well known for their instability. This undesirable feature makes the conclusions of the data analysis unreliable and turns out to be a strong operational limitation. This motivates the design of SIRUS, which combines a simple structure with a remarkable stable behavior when data is perturbed. The algorithm is based on random forests, the predictive accuracy of which is preserved. We demonstrate the efficiency of the method both empirically (through experiments) and theoretically (with the proof of its asymptotic stability). Our R/C++ software implementation sirus is available from CRAN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题