论文标题

通过GWAS摘要数据推断非线性因果效应

Inference of nonlinear causal effects with GWAS summary data

论文作者

Dai, Ben, Li, Chunlin, Xue, Haoran, Pan, Wei, Shen, Xiaotong

论文摘要

大规模基因组关联研究(GWAS)提供了一个令人兴奋的机会,通过使用SNP作为工具变量(IVS)来发现与疾病相关的假定因果基因或危险因素。但是,常规方法假设线性因果关系部分是为了简单性,部分是为了获得GWAS摘要数据。在这项工作中,我们提出了一个新型模型{用于整个转录组的关联研究(TWA)},以在跨IV,暴露/基因和结果中纳入非线性关系,这是对违反有效IV假设的侵犯,允许使用GWAS摘要数据的违规行为,并涵盖了两级最小的Squares作为特殊情况。我们将边际因果效应和非线性转化的估计分解,其中前者是通过切成薄片的反向回归和稀疏的仪器变量回归估算的,后者是通过比率调整后的反向回归估算的。在这基础上,我们提出了一个推论程序。该方法在ADNI基因表达数据中的应用和IGAP GWAS摘要数据确定了与包括ApoE和Tomm40在内的阿尔茨海默氏病相关的18个因果基因,此外还有两阶段最小二乘仅考虑线性关系的两个其他基因。我们的发现表明,需要非线性建模才能释放IV回归的能力来识别潜在的非线性基因性属性关联。随附的本文是我们的python库\ texttt {nl-causal}(\ url {https://nonlinalear-causal.readthedocs.io/}),该方法实现了所提出的方法。

Large-scale genome-wide association studies (GWAS) have offered an exciting opportunity to discover putative causal genes or risk factors associated with diseases by using SNPs as instrumental variables (IVs). However, conventional approaches assume linear causal relations partly for simplicity and partly for the availability of GWAS summary data. In this work, we propose a novel model {for transcriptome-wide association studies (TWAS)} to incorporate nonlinear relationships across IVs, an exposure/gene, and an outcome, which is robust against violations of the valid IV assumptions, permits the use of GWAS summary data, and covers two-stage least squares as a special case. We decouple the estimation of a marginal causal effect and a nonlinear transformation, where the former is estimated via sliced inverse regression and a sparse instrumental variable regression, and the latter is estimated by a ratio-adjusted inverse regression. On this ground, we propose an inferential procedure. An application of the proposed method to the ADNI gene expression data and the IGAP GWAS summary data identifies 18 causal genes associated with Alzheimer's disease, including APOE and TOMM40, in addition to 7 other genes missed by two-stage least squares considering only linear relationships. Our findings suggest that nonlinear modeling is required to unleash the power of IV regression for identifying potentially nonlinear gene-trait associations. Accompanying this paper is our Python library \texttt{nl-causal} (\url{https://nonlinear-causal.readthedocs.io/}) that implements the proposed method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源