论文标题

基于特征向量的稀疏规范相关分析:用于估计多个规范向量的快速计算

Eigenvector-based sparse canonical correlation analysis: Fast computation for estimation of multiple canonical vectors

论文作者

Wang, Wenjia, Zhou, Yi-Hui

论文摘要

经典规范相关分析(CCA)要求矩阵为低维,即功能的数量不能超过样本量。 CCA的最新发展主要集中在高维环境上,其中两个矩阵中的特征数量大大超过了样本量。这些方法在需要迭代的优化问题上施加了惩罚,并顺序估算了多个规范向量。在这项工作中,我们提供了稀疏的典型相关分析之间的稀疏多重回归之间的明确联系,而有效的算法可以同时估算多个规范对,而不是顺序估计多个规范对。此外,该算法自然允许并行计算。这些属性使该算法效率很高。我们就规范对的一致性提供了理论结果。算法和理论发展是基于解决特征向量问题的基础,该问题可以通过现有方法显着区分我们的方法。仿真结果支持提出方法的改善性能。我们将基于特征向量的CCA应用于GTEX甲状腺组织学图像,SNP和RNA-SEQ基因表达数据的分析以及微生物组研究。与传统的稀疏CCA相比,实际数据分析还表明了性能的提高。

Classical canonical correlation analysis (CCA) requires matrices to be low dimensional, i.e. the number of features cannot exceed the sample size. Recent developments in CCA have mainly focused on the high-dimensional setting, where the number of features in both matrices under analysis greatly exceeds the sample size. These approaches impose penalties in the optimization problems that are needed to be solve iteratively, and estimate multiple canonical vectors sequentially. In this work, we provide an explicit link between sparse multiple regression with sparse canonical correlation analysis, and an efficient algorithm that can estimate multiple canonical pairs simultaneously rather than sequentially. Furthermore, the algorithm naturally allows parallel computing. These properties make the algorithm much efficient. We provide theoretical results on the consistency of canonical pairs. The algorithm and theoretical development are based on solving an eigenvectors problem, which significantly differentiate our method with existing methods. Simulation results support the improved performance of the proposed approach. We apply eigenvector-based CCA to analysis of the GTEx thyroid histology images, analysis of SNPs and RNA-seq gene expression data, and a microbiome study. The real data analysis also shows improved performance compared to traditional sparse CCA.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源