论文标题
大型矩阵中的一种新颖的计算算法标志,以任何方式与他人或因变量相关联,当列链接到染色体中的突变时,功率要高得多
A novel, computationally tractable algorithm flags in big matrices every column associated in any way with others or a dependent variable, with much higher power when columns are linked like mutations in chromosomes
论文作者
论文摘要
对与因变量DV相关的自变量IV的子集进行详尽的扫描,仅对于1和2-IV效应才能计算处理。我提出了一个高度计算的涉及参与评分(PAS),在带有标记标记的DM中,每列与其他列密切相关。 PAS检查没有列子集,其计算成本与DM列线性增长,即使在百万列DMS中也是合理的。 PAS利用DM行中标记的关联如何导致行中的相关性匹配。对于与测试列的匹配的每一次这种比较,PAS通过修改比较的总匹配(每DM评分一次)来计算其他匹配,从而得出有条件匹配的分布,该匹配受到测试列的关联而扰动的条件匹配。同样可拖动的是DVPA,它通过在DV中置换标记来标记与DV相关的IV。 P值是通过排列获得的,并通过Sidak校正了多个测试,绕过模型选择。模拟表明,i)pA和dVPA会产生均匀的 - (0,1)在零DMS中的I型错误和ii)检测到与详尽的评估和正式的数量范围的幂等方面的二元关联和N-iv相关的二进制和二元模型,分别与二进制DV相关联,该模型是与详尽的和误差的次数相关的。 所以。检测2条DV相关的100标记+运行的功率在非参数上是最终的,但是检测纯净N-柱相关和纯n-IV DV关联会随着n的增加而呈指数下沉。当染色体中的突变之间存在类似背景关联时,功率在二进制DMS和二进制DM的两倍上增加了大约两倍,特别是在Trinary DMS中,DVPAS过滤器最有效地说背景。
Scanning exhaustively a big data matrix DM for subsets of independent variables IVs that are associated with a dependent variable DV is computationally tractable only for 1- and 2-IV effects. I present a highly computationally tractable Participation-In-Association Score (PAS) that in a DM with markers flags every column that is strongly associated with others. PAS examines no column subsets and its computational cost grows linearly with DM columns, remaining reasonable even in million-column DMs. PAS exploits how associations of markers in DM rows cause matches associations in the rows' pairwise comparisons. For every such comparison with a match at a tested column, PAS computes the other matches by modifying the comparison's total matches (scored once per DM), yielding a distribution of conditional matches that is perturbed by associations of the tested column. Equally tractable is dvPAS that flags DV-associated IVs by permuting the markers in the DV. P values are obtained by permutation and Sidak-corrected for multiple tests, bypassing model selection. Simulations show that i) PAS and dvPAS generate uniform-(0,1)-distributed type I error in null DMs and ii) detect randomly encountered binary and trinary models of significant n-column association and n-IV association with a binary DV, respectively, with power in the order of magnitude of exhaustive evaluation's and false positives that are uniform-(0,1)-distributed or straightforwardly tuned to be so. Power to detect 2-way DV-associated 100-marker+ runs is non-parametrically ultimate but that to detect pure n-column associations and pure n-IV DV associations sinks exponentially as n increases. Power increases about twofold in trinary vs. binary DMs and in a major way when there are background associations like between mutations in chromosomes, specially in trinary DMs where dvPAS filters said background most effectively.