论文标题
对具有长矛距离的绿口模型的混合物的有效,准确的推断
Efficient and accurate inference for mixtures of Mallows models with Spearman distance
论文作者
论文摘要
绿口模型在排名数据的参数建模中占据着核心作用,以了解法官人群的偏好。尽管对于模型规范中可以考虑的排名的指标范围很广,但由于相关模型归一化常数的闭合形式表达,因此选择通常仅限于Kendall,Cayley或Hamming距离。相反,这项工作着重于带有Spearman距离的木棍模型。通过依靠针对i的双重数据增强策略来开发一种有效,准确的EM算法,用于估计具有Spearman距离的绿口模型的有限混合物的有限混合物。 ii)处理受各种审查形式影响的部分排名。此外,引入了标准化常数的新型模型近似值,以支持具有大量项目的基于模型的排名基于挑战性的聚类。通过广泛的模拟研究评估了EM方案的推论能力和近似值的有效性。最后,我们表明,在三个现实世界数据集中的应用程序也认可了我们的建议,也与排名模型的竞争混合物进行了比较。
The Mallows model occupies a central role in parametric modelling of ranking data to learn preferences of a population of judges. Despite the wide range of metrics for rankings that can be considered in the model specification, the choice is typically limited to the Kendall, Cayley or Hamming distances, due to the closed-form expression of the related model normalizing constant. This work instead focuses on the Mallows model with Spearman distance. An efficient and accurate EM algorithm for estimating finite mixtures of Mallows models with Spearman distance is developed, by relying on a twofold data augmentation strategy aimed at i) enlarging the applicability of Mallows models to samples drawn from heterogeneous populations; ii) dealing with partial rankings affected by diverse forms of censoring. Additionally, a novel approximation of the model normalizing constant is introduced to support the challenging model-based clustering of rankings with a large number of items. The inferential ability of the EM scheme and the effectiveness of the approximation are assessed by extensive simulation studies. Finally, we show that the application to three real-world datasets endorses our proposals also in the comparison with competing mixtures of ranking models.