论文标题
随机森林的顺序置换测试可变重要性度量
Sequential Permutation Testing of Random Forest Variable Importance Measures
论文作者
论文摘要
随机森林(RF)可变重要性度量(VIMP)的假设检验仍然是正在进行的研究的主题。在最近的发展中,已经提出了参数测试的启发式方法,其分布假设基于经验证据。在规律性条件下进行的其他正式测试是通过分析得出的。但是,这些方法在计算上可能很昂贵,甚至实际上是不可行的。非参数置换测试也会发生此问题,但是,这些测试是无分布的,并且可以通常应用于任何类型的RF和VIMP。采用这一优势,这里建议使用顺序置换测试和顺序p值估计来降低与常规置换测试相关的高计算成本。流行且广泛使用的置换VIMP是一个实用且相关的应用程序示例。仿真研究的结果证实,顺序测试的理论特性应用,即I型误差概率在标称级别控制,并且与常规置换测试相比,维持高功率所需的排列所需的排列较少。在另外两项申请研究中研究了该方法的数值稳定性。总而言之,可以大大降低计算成本的理论上声音顺序置换测试。给出了申请的建议。通过随附的R软件包$ rfvimptest $提供了各自的实现。该方法也可以轻松地应用于任何类型的预测模型。
Hypothesis testing of random forest (RF) variable importance measures (VIMP) remains the subject of ongoing research. Among recent developments, heuristic approaches to parametric testing have been proposed whose distributional assumptions are based on empirical evidence. Other formal tests under regularity conditions were derived analytically. However, these approaches can be computationally expensive or even practically infeasible. This problem also occurs with non-parametric permutation tests, which are, however, distribution-free and can generically be applied to any type of RF and VIMP. Embracing this advantage, it is proposed here to use sequential permutation tests and sequential p-value estimation to reduce the high computational costs associated with conventional permutation tests. The popular and widely used permutation VIMP serves as a practical and relevant application example. The results of simulation studies confirm that the theoretical properties of the sequential tests apply, that is, the type-I error probability is controlled at a nominal level and a high power is maintained with considerably fewer permutations needed in comparison to conventional permutation testing. The numerical stability of the methods is investigated in two additional application studies. In summary, theoretically sound sequential permutation testing of VIMP is possible at greatly reduced computational costs. Recommendations for application are given. A respective implementation is provided through the accompanying R package $rfvimptest$. The approach can also be easily applied to any kind of prediction model.