论文标题
对执行模型可识别性对高斯混合模型学习动力学的影响
On the Influence of Enforcing Model Identifiability on Learning dynamics of Gaussian Mixture Models
论文作者
论文摘要
学习和分析统计模型的一种常见方法是考虑模型参数空间中的操作。但是,如果我们在参数空间中进行优化,并且参数空间和基础统计模型空间之间没有一对一的映射会发生什么?这些情况经常发生在包括统计混合物或随机神经网络的分层模型中,据说这些模型是单数的。奇异模型在机器学习中揭示了几个重要且研究的问题,例如由于吸引者行为而导致的学习轨迹的收敛速度降低。在这项工作中,我们提出了一种参数空间的相对重聚技术,该技术产生了一种从单数模型中提取常规子模型的一般方法。我们的方法在训练过程中实施了模型可识别性,并研究了在相对参数化下为高斯混合模型(GMM)的梯度下降和期望最大化的学习动力学,显示了更快的实验收敛性和围绕奇异性的动态的改善。将分析扩展到GMM之外,我们进一步分析了在相对重新聚体化及其对概括误差的影响下的Fisher信息矩阵,并展示该方法如何应用于更复杂的模型,例如深层神经网络。
A common way to learn and analyze statistical models is to consider operations in the model parameter space. But what happens if we optimize in the parameter space and there is no one-to-one mapping between the parameter space and the underlying statistical model space? Such cases frequently occur for hierarchical models which include statistical mixtures or stochastic neural networks, and these models are said to be singular. Singular models reveal several important and well-studied problems in machine learning like the decrease in convergence speed of learning trajectories due to attractor behaviors. In this work, we propose a relative reparameterization technique of the parameter space, which yields a general method for extracting regular submodels from singular models. Our method enforces model identifiability during training and we study the learning dynamics for gradient descent and expectation maximization for Gaussian Mixture Models (GMMs) under relative parameterization, showing faster experimental convergence and a improved manifold shape of the dynamics around the singularity. Extending the analysis beyond GMMs, we furthermore analyze the Fisher information matrix under relative reparameterization and its influence on the generalization error, and show how the method can be applied to more complex models like deep neural networks.