大型矩阵中的一种新颖的计算算法标志，以任何方式与他人或因变量相关联，当列链接到染色体中的突变时，功率要高得多

论文标题

大型矩阵中的一种新颖的计算算法标志，以任何方式与他人或因变量相关联，当列链接到染色体中的突变时，功率要高得多

A novel, computationally tractable algorithm flags in big matrices every column associated in any way with others or a dependent variable, with much higher power when columns are linked like mutations in chromosomes

论文作者

Antezana, Marcos A., Machado, Carlos A.

论文摘要

对与因变量DV相关的自变量IV的子集进行详尽的扫描，仅对于1和2-IV效应才能计算处理。我提出了一个高度计算的涉及参与评分（PAS），在带有标记标记的DM中，每列与其他列密切相关。 PAS检查没有列子集，其计算成本与DM列线性增长，即使在百万列DMS中也是合理的。 PAS利用DM行中标记的关联如何导致行中的相关性匹配。对于与测试列的匹配的每一次这种比较，PAS通过修改比较的总匹配（每DM评分一次）来计算其他匹配，从而得出有条件匹配的分布，该匹配受到测试列的关联而扰动的条件匹配。同样可拖动的是DVPA，它通过在DV中置换标记来标记与DV相关的IV。 P值是通过排列获得的，并通过Sidak校正了多个测试，绕过模型选择。模拟表明，i）pA和dVPA会产生均匀的 - （0,1）在零DMS中的I型错误和ii）检测到与详尽的评估和正式的数量范围的幂等方面的二元关联和N-iv相关的二进制和二元模型，分别与二进制DV相关联，该模型是与详尽的和误差的次数相关的。所以。检测2条DV相关的100标记+运行的功率在非参数上是最终的，但是检测纯净N-柱相关和纯n-IV DV关联会随着n的增加而呈指数下沉。当染色体中的突变之间存在类似背景关联时，功率在二进制DMS和二进制DM的两倍上增加了大约两倍，特别是在Trinary DMS中，DVPAS过滤器最有效地说背景。

Scanning exhaustively a big data matrix DM for subsets of independent variables IVs that are associated with a dependent variable DV is computationally tractable only for 1- and 2-IV effects. I present a highly computationally tractable Participation-In-Association Score (PAS) that in a DM with markers flags every column that is strongly associated with others. PAS examines no column subsets and its computational cost grows linearly with DM columns, remaining reasonable even in million-column DMs. PAS exploits how associations of markers in DM rows cause matches associations in the rows' pairwise comparisons. For every such comparison with a match at a tested column, PAS computes the other matches by modifying the comparison's total matches (scored once per DM), yielding a distribution of conditional matches that is perturbed by associations of the tested column. Equally tractable is dvPAS that flags DV-associated IVs by permuting the markers in the DV. P values are obtained by permutation and Sidak-corrected for multiple tests, bypassing model selection. Simulations show that i) PAS and dvPAS generate uniform-(0,1)-distributed type I error in null DMs and ii) detect randomly encountered binary and trinary models of significant n-column association and n-IV association with a binary DV, respectively, with power in the order of magnitude of exhaustive evaluation's and false positives that are uniform-(0,1)-distributed or straightforwardly tuned to be so. Power to detect 2-way DV-associated 100-marker+ runs is non-parametrically ultimate but that to detect pure n-column associations and pure n-IV DV associations sinks exponentially as n increases. Power increases about twofold in trinary vs. binary DMs and in a major way when there are background associations like between mutations in chromosomes, specially in trinary DMs where dvPAS filters said background most effectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题