论文标题
细胞类型反卷积的统计推断
Statistical Inference for Cell Type Deconvolution
论文作者
论文摘要
从不同平台(例如散装和单细胞RNA测序)整合数据对于提高复杂生物学分析(例如细胞类型反卷积)的准确性和解释性至关重要。但是,目标和参考数据集之间的测量和生物异质性使此任务变得复杂。对于细胞类型反卷积的问题,现有方法经常忽略细胞类型比例估计的相关性和不确定性,可能导致在多个个体的下游比较中对假阳性的附加问题。我们介绍了MEAD,这是一个综合的统计框架,不仅估计细胞类型比例,而且对估计值提供了渐近有效的统计推断。我们的主要贡献之一是可识别性结果,尽管平台之间的测量偏差有任意的异质性,但该结果严格地确定了细胞类型比例的条件。 Mead还支持反卷积后各个个体的细胞类型比例的比较,这是基因基因相关性和生物学变异性的解释。通过模拟和实数分析,Mead证明了在复杂的生物系统中推断细胞类型组成的优异可靠性。
Integrating data from different platforms, such as bulk and single-cell RNA sequencing, is crucial for improving the accuracy and interpretability of complex biological analyses like cell type deconvolution. However, this task is complicated by measurement and biological heterogeneity between target and reference datasets. For the problem of cell type deconvolution, existing methods often neglect the correlation and uncertainty in cell type proportion estimates, possibly leading to an additional concern of false positives in downstream comparisons across multiple individuals. We introduce MEAD, a comprehensive statistical framework that not only estimates cell type proportions but also provides asymptotically valid statistical inference on the estimates. One of our key contributions is the identifiability result, which rigorously establishes the conditions under which cell type proportions are identifiable despite arbitrary heterogeneity of measurement biases between platforms. MEAD also supports the comparison of cell type proportions across individuals after deconvolution, accounting for gene-gene correlations and biological variability. Through simulations and real-data analysis, MEAD demonstrates superior reliability for inferring cell type compositions in complex biological systems.