论文标题
通过功能空间分区来改进对共享内核模型学习的EM学习
Improvements to Supervised EM Learning of Shared Kernel Models by Feature Space Partitioning
论文作者
论文摘要
期望最大化(EM)通常被认为是一种无监督的学习方法,用于估计混合物分布的参数,但是当可以使用类标签时,它也可以用于监督学习。因此,EM已应用于训练神经网,包括概率径向基函数(PRBF)网络或共享内核(SK)模型。本文解决了该领域先前工作的两个主要缺点:EM培训算法的推导缺乏严格性;以及该技术的计算复杂性,该技术将其限制在低维数据集中。我们首先为高斯共享的内核模型PRBF分类器提供了EM的详细推导,利用数据关联理论获得完整的数据可能性,Baum的辅助函数(E-step)及其随后的最大化(M-Step)。为了降低所得Skem算法的复杂性,我们将特征空间分配到变量的$ r $非重叠子集中。关节数据可能性的产生产品分解,当特征分区独立时,这是确切的,可以并行实现SKEM,并在$ r^2 $倍的复杂性下实现。在MNIST数据集中证明了分区SKEM算法的操作,并将其与其非分区的对应物进行了比较。最终,可以实现降低复杂性的提高性能。与标准分类算法的比较还提供了许多其他基准数据集。
Expectation maximisation (EM) is usually thought of as an unsupervised learning method for estimating the parameters of a mixture distribution, however it can also be used for supervised learning when class labels are available. As such, EM has been applied to train neural nets including the probabilistic radial basis function (PRBF) network or shared kernel (SK) model. This paper addresses two major shortcomings of previous work in this area: the lack of rigour in the derivation of the EM training algorithm; and the computational complexity of the technique, which has limited it to low dimensional data sets. We first present a detailed derivation of EM for the Gaussian shared kernel model PRBF classifier, making use of data association theory to obtain the complete data likelihood, Baum's auxiliary function (the E-step) and its subsequent maximisation (M-step). To reduce complexity of the resulting SKEM algorithm, we partition the feature space into $R$ non-overlapping subsets of variables. The resulting product decomposition of the joint data likelihood, which is exact when the feature partitions are independent, allows the SKEM to be implemented in parallel and at $R^2$ times lower complexity. The operation of the partitioned SKEM algorithm is demonstrated on the MNIST data set and compared with its non-partitioned counterpart. It eventuates that improved performance at reduced complexity is achievable. Comparisons with standard classification algorithms are provided on a number of other benchmark data sets.