论文标题

直接直接仿制

Deep Direct Likelihood Knockoffs

论文作者

Sudarshan, Mukund, Tansey, Wesley, Ranganath, Rajesh

论文摘要

预测建模通常使用黑匣子机器学习方法,例如深神经网络,以实现最新的性能。在科学领域,科学家通常希望发现哪些特征对于做出预测很重要。这些发现可能会导致昂贵的后续实验,因此,发现错误率不是太高的重要性。模型X衰减可以通过控制FDR来发现重要特征。但是,仿冒品需要丰富的生成模型,能够准确地对仿冒特征进行建模,同时确保它们遵守所谓的“交换”属性。我们开发了直接的直接仿仿型(DDLK),该仿冒品直接最大程度地减少了仿制掉期性能所隐含的KL差异。 DDLK由两个阶段组成:它首先最大化了特征的明确可能性,然后最大程度地减少了功能的关节分布与仿冒品之间的KL差异以及它们之间的任何交换。为了确保生成的仿冒品在任何可能的交换中都是有效的,DDLK使用Gumbel-Softmax技巧来优化最坏情况下的仿制发生器。我们发现DDLK具有比基线更高的功率,同时控制了各种合成和真实基准的错误发现率,包括涉及Covid-19的一个震中的大型数据集的任务。

Predictive modeling often uses black box machine learning methods, such as deep neural networks, to achieve state-of-the-art performance. In scientific domains, the scientist often wishes to discover which features are actually important for making the predictions. These discoveries may lead to costly follow-up experiments and as such it is important that the error rate on discoveries is not too high. Model-X knockoffs enable important features to be discovered with control of the FDR. However, knockoffs require rich generative models capable of accurately modeling the knockoff features while ensuring they obey the so-called "swap" property. We develop Deep Direct Likelihood Knockoffs (DDLK), which directly minimizes the KL divergence implied by the knockoff swap property. DDLK consists of two stages: it first maximizes the explicit likelihood of the features, then minimizes the KL divergence between the joint distribution of features and knockoffs and any swap between them. To ensure that the generated knockoffs are valid under any possible swap, DDLK uses the Gumbel-Softmax trick to optimize the knockoff generator under the worst-case swap. We find DDLK has higher power than baselines while controlling the false discovery rate on a variety of synthetic and real benchmarks including a task involving a large dataset from one of the epicenters of COVID-19.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源