论文标题

生成模型提出的分子的合成性

The Synthesizability of Molecules Proposed by Generative Models

论文作者

Gao, Wenhao, Coley, Connor W.

论文摘要

功能分子的发现是一个昂贵且耗时的过程,体现了小分子治疗发现的成本上升。一类对早期药物发现日益兴趣的技术是从头产生和优化,这是由于新的深度学习方法的发展而催化的。这些技术可以提出旨在最大化多目标功能的新型分子结构,例如,适合于对特定靶标的治疗性,而无需依赖化学空间的蛮力探索。但是,这些方法的实用性因综合性的无知而阻碍了。为了强调此问题的严重性,我们使用数据驱动的计算机辅助综合计划程序来量化最新生成模型提出的分子的频率,无法容易合成。我们的分析表明,尽管在流行的定量基准上表现良好,但这些模型仍为这些模型产生不切实际的分子结构。综合复杂性启发式方法可以成功地偏向于合成的化学空间,尽管这样做一定会损害主要目标。该分析表明,为了改善这些模型在实际发现工作流程中的实用性,有必要进行新的算法开发。

The discovery of functional molecules is an expensive and time-consuming process, exemplified by the rising costs of small molecule therapeutic discovery. One class of techniques of growing interest for early-stage drug discovery is de novo molecular generation and optimization, catalyzed by the development of new deep learning approaches. These techniques can suggest novel molecular structures intended to maximize a multi-objective function, e.g., suitability as a therapeutic against a particular target, without relying on brute-force exploration of a chemical space. However, the utility of these approaches is stymied by ignorance of synthesizability. To highlight the severity of this issue, we use a data-driven computer-aided synthesis planning program to quantify how often molecules proposed by state-of-the-art generative models cannot be readily synthesized. Our analysis demonstrates that there are several tasks for which these models generate unrealistic molecular structures despite performing well on popular quantitative benchmarks. Synthetic complexity heuristics can successfully bias generation toward synthetically-tractable chemical space, although doing so necessarily detracts from the primary objective. This analysis suggests that to improve the utility of these models in real discovery workflows, new algorithm development is warranted.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源