论文标题

一个基于灵活模型的框架,用于强大的突变特征估计

A flexible model-based framework for robust estimation of mutational signatures

论文作者

Laursen, Ragnhild, Maretty, Lasse, Hobolth, Asger

论文摘要

癌症中的体细胞突变可以看作是几个突变特征的混合物分布,可以使用非负基质分解(NMF)推断出来。突变特征先前已使用简单的单核苷酸相互作用模型或一般的三核苷酸相互作用模型进行了参数化。我们描述了一个灵活而新颖的框架,用于识别突变特征的生物学上合理的参数化,尤其是用于估计双核苷酸相互作用模型的框架。估计过程基于预期 - 最大化(EM)算法和对数线性准峰模型中的回归。我们表明,二核苷酸的相互作用特征在统计上是稳定的且足够复杂的,可以符合突变模式。二核苷酸相互作用特征通常在适当拟合数据和避免过度拟合之间达到适当的平衡。它们比单核苷酸的相互作用特征更适合数据,并且在生物学上更合理,并且比富含参数的三核苷酸相互作用特征更稳定。我们说明了我们从癌症患者的三个数据集的体细胞突变计数数据集说明了我们的框架。

Somatic mutations in cancer can be viewed as a mixture distribution of several mutational signatures, which can be inferred using non-negative matrix factorization (NMF). Mutational signatures have previously been parametrized using either simple mono-nucleotide interaction models or general tri-nucleotide interaction models. We describe a flexible and novel framework for identifying biologically plausible parametrizations of mutational signatures, and in particular for estimating di-nucleotide interaction models. The estimation procedure is based on the expectation--maximization (EM) algorithm and regression in the log-linear quasi--Poisson model. We show that di-nucleotide interaction signatures are statistically stable and sufficiently complex to fit the mutational patterns. Di-nucleotide interaction signatures often strike the right balance between appropriately fitting the data and avoiding over-fitting. They provide a better fit to data and are biologically more plausible than mono-nucleotide interaction signatures, and the parametrization is more stable than the parameter-rich tri-nucleotide interaction signatures. We illustrate our framework on three data sets of somatic mutation counts from cancer patients.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源