通过贝叶斯线性编程对信号的控制和定位

论文标题

通过贝叶斯线性编程对信号的控制和定位

Controlled Discovery and Localization of Signals via Bayesian Linear Programming

论文作者

Spector, Asher, Janson, Lucas

论文摘要

科学家通常必须同时定位并发现信号。例如，在遗传细图中，附近遗传变异之间的高相关性使得很难确定因果变异的确切位置。因此，统计任务是在控制假阳性的同时，输出尽可能多的不相交区域，每个区域都尽可能小。在任何应用信号无法完美局部的应用中都会出现类似的问题，例如在天文调查中定位恒星和在顺序数据中的变更点检测。这些问题的常见贝叶斯方法涉及计算信号位置的后验分布。但是，将这些后代转化为实际可信区域的现有程序无法捕获后部中的所有信息，从而导致较低的力量，并且（有时）膨胀的虚假发现。有了这种动机，我们引入了贝叶斯线性编程（BLIP）。鉴于信号的后验分布，BLIP输出的信号可靠区域可在控制假阳性时可靠地确保预期功率几乎最大化。 Blip克服了极高的维持和非凸问题，以在控制假阳性的同时几乎可以最大程度地提高预期功率。与计算后部的成本相比，BLIP在计算上非常有效，并且几乎可以围绕任何贝叶斯模型和算法包裹。将BLIP应用于英国生物银行数据的现有最新分析（用于遗传映射）和Sloan Digital Sky Survey（用于天文点源检测）中，在短短几分钟的额外计算中，功率提高了30-120％。 BLIP在Pyblip（Python）和Blipr（R）中实现。

Scientists often must simultaneously localize and discover signals. For instance, in genetic fine-mapping, high correlations between nearby genetic variants make it hard to identify the exact locations of causal variants. So the statistical task is to output as many disjoint regions containing a signal as possible, each as small as possible, while controlling false positives. Similar problems arise in any application where signals cannot be perfectly localized, such as locating stars in astronomical surveys and changepoint detection in sequential data. Common Bayesian approaches to these problems involve computing a posterior distribution over signal locations. However, existing procedures to translate these posteriors into actual credible regions for the signals fail to capture all the information in the posterior, leading to lower power and (sometimes) inflated false discoveries. With this motivation, we introduce Bayesian Linear Programming (BLiP). Given a posterior distribution over signals, BLiP outputs credible regions for signals which verifiably nearly maximize expected power while controlling false positives. BLiP overcomes an extremely high-dimensional and nonconvex problem to verifiably nearly maximize expected power while controlling false positives. BLiP is very computationally efficient compared to the cost of computing the posterior and can wrap around nearly any Bayesian model and algorithm. Applying BLiP to existing state-of-the-art analyses of UK Biobank data (for genetic fine-mapping) and the Sloan Digital Sky Survey (for astronomical point source detection) increased power by 30-120% in just a few minutes of additional computation. BLiP is implemented in pyblip (Python) and blipr (R).

下载PDF全文

下载文献需遵守相关版权规定

论文标题