用AI-和启用HPC的潜在客户生成定位SARS-COV-2：第一个数据发布

论文标题

用AI-和启用HPC的潜在客户生成定位SARS-COV-2：第一个数据发布

Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release

论文作者

Babuji, Yadu, Blaiszik, Ben, Brettin, Tom, Chard, Kyle, Chard, Ryan, Clyde, Austin, Foster, Ian, Hong, Zhi, Jha, Shantenu, Li, Zhuozhao, Liu, Xuefeng, Ramanathan, Arvind, Ren, Yi, Saint, Nicholaus, Schwarting, Marcus, Stevens, Rick, van Dam, Hubertus, Wagner, Rick

论文摘要

全球的研究人员正在寻求快速重新使用现有药物，或发现新药以应对由严重急性呼吸综合征冠状病毒2（SARS-COV-2）引起的新型冠状病毒病（COVID-19）。一种有希望的方法是训练机器学习（ML）和人工智能（AI）工具来筛选大量小分子。作为对这项工作的贡献，我们使用高性能计算（HPC）将许多小分子从各种来源汇总到这些分子的计算机多样性，并使用计算特性来训练ML/AI模型，然后使用所得模型进行筛选。在第一个数据发行中，我们从社区来源收集了23个数据集，这些数据集代表超过4.2 B的分子，这些分子富含预计：1）分子指纹以帮助相似性搜索，2）分子的2D图像，以启用基于图像的深度学习方法的探索和应用。该数据发布涵盖了4.2 B分子和60 TB预计数据的结构信息。未来的版本将扩展数据，包括更详细的分子模拟，计算模型和其他产品。

Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules. As a contribution to that effort, we are aggregating numerous small molecules from a variety of sources, using high-performance computing (HPC) to computer diverse properties of those molecules, using the computed properties to train ML/AI models, and then using the resulting models for screening. In this first data release, we make available 23 datasets collected from community sources representing over 4.2 B molecules enriched with pre-computed: 1) molecular fingerprints to aid similarity searches, 2) 2D images of molecules to enable exploration and application of image-based deep learning methods, and 3) 2D and 3D molecular descriptors to speed development of machine learning models. This data release encompasses structural information on the 4.2 B molecules and 60 TB of pre-computed data. Future releases will expand the data to include more detailed molecular simulations, computed models, and other products.

下载PDF全文

下载文献需遵守相关版权规定

论文标题