论文标题
3D骄傲而没有2D偏见:基于结构的配体设计的偏置控制的多级生成模型
3D pride without 2D prejudice: Bias-controlled multi-level generative models for structure-based ligand design
论文作者
论文摘要
基于结构的分子设计的生成模型对药物发现具有巨大的希望,并有可能加快命中率到铅的开发周期,同时提高候选药物的质量并降低成本。但是,数据稀疏性和偏见是3D感知模型开发的两个主要障碍。在这里,我们提出了基于多层次对比学习的第一范围培训方案,以提高偏见控制和数据效率。该框架利用了使用配体 - 蛋白质复合物数据集的2D生成建模的大数据资源。结果是层次生成模型,它们在拓扑上是无偏见的,可解释的和可定制的。我们展示了如何通过将生成后的后部解析为化学,拓扑和结构上下文因素,我们不仅避免了生成模型的设计和评估中的常见陷阱,而且还可以详细介绍生成过程本身。这种提高的透明度显着有助于方法开发,除了允许对新颖性与熟悉度的细粒度控制。
Generative models for structure-based molecular design hold significant promise for drug discovery, with the potential to speed up the hit-to-lead development cycle, while improving the quality of drug candidates and reducing costs. Data sparsity and bias are, however, two main roadblocks to the development of 3D-aware models. Here we propose a first-in-kind training protocol based on multi-level contrastive learning for improved bias control and data efficiency. The framework leverages the large data resources available for 2D generative modelling with datasets of ligand-protein complexes. The result are hierarchical generative models that are topologically unbiased, explainable and customizable. We show how, by deconvolving the generative posterior into chemical, topological and structural context factors, we not only avoid common pitfalls in the design and evaluation of generative models, but furthermore gain detailed insight into the generative process itself. This improved transparency significantly aids method development, besides allowing fine-grained control over novelty vs familiarity.