基于结构的药物设计，具有模棱两可的扩散模型

论文标题

基于结构的药物设计，具有模棱两可的扩散模型

Structure-based Drug Design with Equivariant Diffusion Models

论文作者

Schneuing, Arne, Harris, Charles, Du, Yuanqi, Didi, Kieran, Jamasb, Arian, Igashov, Ilia, Du, Weitao, Gomes, Carla, Blundell, Tom, Lio, Pietro, Welling, Max, Bronstein, Michael, Correia, Bruno

论文摘要

基于结构的药物设计（SBDD）旨在设计具有高亲和力和特异性与预定蛋白靶标的特异性结合的小分子配体。生成的SBDD方法利用及其蛋白质靶标的药物的结构数据来提出新药候选物。这些方法通常一次以自回归方式使用绑定口袋，并以前添加配体原子作为每个步骤中的上下文。最近，扩散生成模型的激增进入了该领域，该领域有望更忠实地捕获天然配体的统计特性。但是，大多数现有的方法专门针对化合物的自下而上的新设计或通过特定于任务的模型应对其他药物开发挑战。后者需要策划合适的数据集，仔细的模型工程以及每项任务的重新划痕。在这里，我们展示了如何将单个预训练的扩散模型应用于更广泛的问题，例如现成的特性优化，显式的负面设计和局部分子设计。我们将SBDD提出为3D条件生成问题，并存在DIFFSBDD，即SE（3） - 等级扩散模型，该模型生成以蛋白质口袋为条件的新型配体。我们的计算机实验表明，DIFFSBDD有效地捕获了地面真相数据的统计数据。此外，我们展示了如何根据各种计算指标使用其他约束来改善产生的药物候选物。这些结果支持以下假设：扩散模型比以前的方法更准确地代表结构数据的复杂分布，并且能够合并其他设计目标和约束，而只是更改采样策略。

Structure-based drug design (SBDD) aims to design small-molecule ligands that bind with high affinity and specificity to pre-determined protein targets. Generative SBDD methods leverage structural data of drugs in complex with their protein targets to propose new drug candidates. These approaches typically place one atom at a time in an autoregressive fashion using the binding pocket as well as previously added ligand atoms as context in each step. Recently a surge of diffusion generative models has entered this domain which hold promise to capture the statistical properties of natural ligands more faithfully. However, most existing methods focus exclusively on bottom-up de novo design of compounds or tackle other drug development challenges with task-specific models. The latter requires curation of suitable datasets, careful engineering of the models and retraining from scratch for each task. Here we show how a single pre-trained diffusion model can be applied to a broader range of problems, such as off-the-shelf property optimization, explicit negative design, and partial molecular design with inpainting. We formulate SBDD as a 3D-conditional generation problem and present DiffSBDD, an SE(3)-equivariant diffusion model that generates novel ligands conditioned on protein pockets. Our in silico experiments demonstrate that DiffSBDD captures the statistics of the ground truth data effectively. Furthermore, we show how additional constraints can be used to improve the generated drug candidates according to a variety of computational metrics. These results support the assumption that diffusion models represent the complex distribution of structural data more accurately than previous methods, and are able to incorporate additional design objectives and constraints changing nothing but the sampling strategy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题