基于机器学习方法的部分平均依赖性测试和测量

论文标题

基于机器学习方法的部分平均依赖性测试和测量

Test and Measure for Partial Mean Dependence Based on Machine Learning Methods

论文作者

Cai, Leheng, Guo, Xu, Zhong, Wei

论文摘要

研究协变量$ w $的重要性对于$ y $ y $ y $ y $ y $ y $的重要性很重要。为此，我们根据机器学习方法和数据分裂提出了针对部分平均独立性问题的显着性测试。测试统计量将零假设下的标准卡方分布收敛，同时在固定替代假设下将其收敛为正态分布。还讨论了功率增强和算法稳定性。如果拒绝零假设，我们提出了一个部分普遍的相关性度量（PGMC），以测量在控制$ z $的非线性效应后，给定$ w $的部分平均依赖性。我们介绍了PGMC的具有吸引力的理论属性，并以最佳的根源$ n $收敛速率建立了估计器的渐变态性。此外，还得出了PGMC的有效置信区间。作为一个重要的特殊情况，当没有有条件的协变量$ z $时，我们在无模型环境中引入了对响应的协变量总体意义的新测试。还进行了数值研究和实际数据分析，以与现有方法进行比较，并证明我们提出的程序的有效性和灵活性。

It is of importance to investigate the significance of a subset of covariates $W$ for the response $Y$ given covariates $Z$ in regression modeling. To this end, we propose a significance test for the partial mean independence problem based on machine learning methods and data splitting. The test statistic converges to the standard chi-squared distribution under the null hypothesis while it converges to a normal distribution under the fixed alternative hypothesis. Power enhancement and algorithm stability are also discussed. If the null hypothesis is rejected, we propose a partial Generalized Measure of Correlation (pGMC) to measure the partial mean dependence of $Y$ given $W$ after controlling for the nonlinear effect of $Z$. We present the appealing theoretical properties of the pGMC and establish the asymptotic normality of its estimator with the optimal root-$N$ convergence rate. Furthermore, the valid confidence interval for the pGMC is also derived. As an important special case when there are no conditional covariates $Z$, we introduce a new test of overall significance of covariates for the response in a model-free setting. Numerical studies and real data analysis are also conducted to compare with existing approaches and to demonstrate the validity and flexibility of our proposed procedures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题