论文标题
使用混合效应模型从相关数据集学习贝叶斯网络
Using Mixed-Effects Models to Learn Bayesian Networks from Related Data Sets
论文作者
论文摘要
我们通常认为,在学习贝叶斯网络的结构时,数据是一组均匀的观察值。但是,它们通常包括相关但不相关的不同数据集,因为它们是以不同的方式收集或从不同的人群中收集的。 在我们以前的工作(Azzimonti,Corani和Scutari,2021年)中,我们提出了离散数据的封闭形式的贝叶斯层次级别得分,该分散数据将相关数据集的信息汇集以学习单个包含的网络结构,同时考虑到其概率结构的差异。在本文中,我们提供了一种类似的解决方案,用于使用混合效应模型从连续数据中学习贝叶斯网络,以在相关数据集中汇总信息。我们研究了其结构,参数,预测和分类的精度,我们表明它的表现优于有条件的高斯贝叶斯网络(不执行任何合并)和古典高斯贝叶斯网络(这无视数据的异质性质)。对于低样本量和不平衡数据集的改进标记。
We commonly assume that data are a homogeneous set of observations when learning the structure of Bayesian networks. However, they often comprise different data sets that are related but not homogeneous because they have been collected in different ways or from different populations. In our previous work (Azzimonti, Corani and Scutari, 2021), we proposed a closed-form Bayesian Hierarchical Dirichlet score for discrete data that pools information across related data sets to learn a single encompassing network structure, while taking into account the differences in their probabilistic structures. In this paper, we provide an analogous solution for learning a Bayesian network from continuous data using mixed-effects models to pool information across the related data sets. We study its structural, parametric, predictive and classification accuracy and we show that it outperforms both conditional Gaussian Bayesian networks (that do not perform any pooling) and classical Gaussian Bayesian networks (that disregard the heterogeneous nature of the data). The improvement is marked for low sample sizes and for unbalanced data sets.