归因吗？预测模型的基准

论文标题

归因吗？预测模型的基准

Does imputation matter? Benchmark for predictive models

论文作者

Woźnica, Katarzyna, Biecek, Przemysław

论文摘要

不完整的数据在实际应用中很常见。大多数预测的机器学习模型无法处理缺失值，因此它们需要进行一些预处理。尽管许多算法用于数据插补，但我们不理解不同方法对预测模型性能的影响。本文首先系统地评估了预测模型数据插入算法的经验有效性。主要贡献是（1）基于现实生活分类任务的经验基准测试的一般方法的建议，以及（2）对数据集集合和ML算法集合的不同插补方法的比较分析。

Incomplete data are common in practical applications. Most predictive machine learning models do not handle missing values so they require some preprocessing. Although many algorithms are used for data imputation, we do not understand the impact of the different methods on the predictive models' performance. This paper is first that systematically evaluates the empirical effectiveness of data imputation algorithms for predictive models. The main contributions are (1) the recommendation of a general method for empirical benchmarking based on real-life classification tasks and the (2) comparative analysis of different imputation methods for a collection of data sets and a collection of ML algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题