论文标题
贝叶斯非参数方法,用于订购物种采样问题
A Bayesian Nonparametric Approach to Species Sampling Problems with Ordering
论文作者
论文摘要
物种采样问题(SSP)是指大量的统计问题,要求估算不可观察的人群的未知物种组成的(离散)功能。 SSP的一个共同特征是它们在物种标记方面的不变性,这是在流行的Pitman-Yor过程(PYP)之前,贝叶斯非参数(BNP)方法的核心。在本文中,我们开发了一种BNP的方法,因为将订购或排名分配给物种标签,而不是物种标签“不变”的SSP。受人口遗传学文献的启发,我们研究了以下SSP的订购:从属于物种的个体(等位基因)的未知人群中进行可观察的样本,并根据重量(年龄)进行订购,以估算第一个$ r $ r $订单的样品的频率,包括其他未订购样品的样品,包括其他未包含的样品。通过依靠有序的pyp先验,我们获得了第一个$ r $订单频率的明确后验分布,估计很容易实施和计算效率。我们将我们的方法应用于遗传变异的分析,显示了其在估计最古老的等位基因频率方面的有效性,然后我们讨论了其他潜在的应用。
Species-sampling problems (SSPs) refer to a vast class of statistical problems calling for the estimation of (discrete) functionals of the unknown species composition of an unobservable population. A common feature of SSPs is their invariance with respect to species labeling, which is at the core of the Bayesian nonparametric (BNP) approach to SSPs under the popular Pitman-Yor process (PYP) prior. In this paper, we develop a BNP approach to SSPs that are not "invariant" to species labeling, in the sense that an ordering or ranking is assigned to species' labels. Inspired by the population genetics literature on age-ordered alleles' compositions, we study the following SSP with ordering: given an observable sample from an unknown population of individuals belonging to species (alleles), with species' labels being ordered according to weights (ages), estimate the frequencies of the first $r$ order species' labels in an enlarged sample obtained by including additional unobservable samples. By relying on an ordered PYP prior, we obtain an explicit posterior distribution of the first $r$ order frequencies, with estimates being of easy implementation and computationally efficient. We apply our approach to the analysis of genetic variation, showing its effectiveness in estimating the frequency of the oldest allele, and then we discuss other potential applications.