概率模型的有针对性的积极学习

论文标题

概率模型的有针对性的积极学习

Targeted active learning for probabilistic models

论文作者

Tosh, Christopher, Tec, Mauricio, Tansey, Wesley

论文摘要

科学的基本任务是设计实验，以产生有关正在研究的系统的宝贵见解。从数学上讲，这些见解可以表示为实用性或风险函数，以塑造进行每个实验的价值。我们提出了PDBAL，这是一种针对性的主动学习方法，可适应设计实验以最大程度地提高科学实用性。 PDBAL采用用户指定的风险函数，并将其与实验结果的概率模型相结合，以选择在高意外模型上快速收敛的设计。我们证明了PDBAL标签复杂性的理论界限，并为使用常见的指数家族可能性设计实验提供了快速的封闭式解决方案。在模拟研究中，PDBAL始终优于标准的非目标方法，这些方法着重于在设计领域最大化预期信息增益。最后，我们通过在大型癌症药物筛查数据集上进行的一项研究证明了PDBAL的科学潜力，在该数据集中，PDBAL迅速以实验总数的一小部分恢复了最有效的药物。

A fundamental task in science is to design experiments that yield valuable insights about the system under study. Mathematically, these insights can be represented as a utility or risk function that shapes the value of conducting each experiment. We present PDBAL, a targeted active learning method that adaptively designs experiments to maximize scientific utility. PDBAL takes a user-specified risk function and combines it with a probabilistic model of the experimental outcomes to choose designs that rapidly converge on a high-utility model. We prove theoretical bounds on the label complexity of PDBAL and provide fast closed-form solutions for designing experiments with common exponential family likelihoods. In simulation studies, PDBAL consistently outperforms standard untargeted approaches that focus on maximizing expected information gain over the design space. Finally, we demonstrate the scientific potential of PDBAL through a study on a large cancer drug screen dataset where PDBAL quickly recovers the most efficacious drugs with a small fraction of the total number of experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题