论文标题

研究binning对因果发现的影响

Investigating the effect of binning on causal discovery

论文作者

Deckert, Andrew Colt, Kummerfeld, Erich

论文摘要

在数据收集,分析和呈现中,数值连续测量的BINNING(又称离散化)是一种广泛但有争议的实践。已经评估了许多不同类型的数据分析方法的嵌套的后果,但是到目前为止,binning对因果发现算法的影响尚未直接研究。本文报告了一项仿真研究的结果,该研究检查了Binning对贪婪等效搜索(GES)因果发现算法的影响。我们的发现表明,未上系列的连续数据通常会导致最高的搜索性能,但是确定了一些例外。我们还发现,BINNED数据对样本量和调谐参数的变化更为敏感,并确定了样本量,嵌套和调整参数对性能的一些交互作用。

Binning (a.k.a. discretization) of numerically continuous measurements is a wide-spread but controversial practice in data collection, analysis, and presentation. The consequences of binning have been evaluated for many different kinds of data analysis methods, however so far the effect of binning on causal discovery algorithms has not been directly investigated. This paper reports the results of a simulation study that examined the effect of binning on the Greedy Equivalence Search (GES) causal discovery algorithm. Our findings suggest that unbinned continuous data often result in the highest search performance, but some exceptions are identified. We also found that binned data are more sensitive to changes in sample size and tuning parameters, and identified some interactive effects between sample size, binning, and tuning parameter on performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源