论文标题

在非恒定类失衡下的模型评估

On Model Evaluation under Non-constant Class Imbalance

论文作者

Brabec, Jan, Komárek, Tomáš, Franc, Vojtěch, Machlica, Lukáš

论文摘要

许多现实世界的分类问题显着造成了平衡,以损害兴趣类别。标准的适当评估指标集是众所周知的,但通常的假设是测试数据集的不平衡等于现实世界中的不平衡。实际上,由于各种原因,这种假设通常被打破。然后,报告的结果通常过于乐观,可能会导致关于拟议技术的工业影响和适用性的错误结论。我们介绍了专注于在非恒定类失衡下评估的方法。我们表明,不仅是常用指标的绝对值,而且即使是使用评估指标的分类器顺序也受到不平衡率的变化影响。最后,我们证明,使用子采样以获得与类别不平衡的测试数据集等于在野外观察到的数据集,最终可能导致分类器的性能估算中的重大错误。

Many real-world classification problems are significantly class-imbalanced to detriment of the class of interest. The standard set of proper evaluation metrics is well-known but the usual assumption is that the test dataset imbalance equals the real-world imbalance. In practice, this assumption is often broken for various reasons. The reported results are then often too optimistic and may lead to wrong conclusions about industrial impact and suitability of proposed techniques. We introduce methods focusing on evaluation under non-constant class imbalance. We show that not only the absolute values of commonly used metrics, but even the order of classifiers in relation to the evaluation metric used is affected by the change of the imbalance rate. Finally, we demonstrate that using subsampling in order to get a test dataset with class imbalance equal to the one observed in the wild is not necessary, and eventually can lead to significant errors in classifier's performance estimate.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源