论文标题

用于同时设计分子和合成反应网络的贝叶斯顺序堆叠算法

Bayesian Sequential Stacking Algorithm for Concurrently Designing Molecules and Synthetic Reaction Networks

论文作者

Zhang, Qi, Liu, Chang, Wu, Stephen, Yoshida, Ryo

论文摘要

在过去的几年中,使用机器学习的从头分子设计取得了巨大的技术进步,但其实际部署并没有那么成功。这主要是由于合成此类计算设计的分子的成本和技术困难。为了克服此类障碍,近年来对使用深神经网络的合成路线设计进行了各种方法。但是,在同时设计分子及其合成路线方面几乎没有取得进展。在这里,我们提出了在贝叶斯推论框架内使用所需特性及其合成路线的分子同时设计分子的问题。设计变量由反应网络及其网络拓扑中的一组反应物组成。该设计空间非常大,因为它包括可购买反应物的所有组合,通常按数百万或更高的顺序组成。此外,设计的反应网络还可以采用简单多步线性反应路线以外的任何拓扑。为了解决这个硬组合问题,我们提出了一种强大的顺序蒙特卡洛算法,该算法通过顺序构建单步反应来递归设计合成反应网络。与启发式组合搜索方法相比,在基于市售化合物设计类似药物的分子的案例研究中,提出的方法在计算效率,覆盖范围和新颖性方面表现出了压倒性的性能。

In the last few years, de novo molecular design using machine learning has made great technical progress but its practical deployment has not been as successful. This is mostly owing to the cost and technical difficulty of synthesizing such computationally designed molecules. To overcome such barriers, various methods for synthetic route design using deep neural networks have been studied intensively in recent years. However, little progress has been made in designing molecules and their synthetic routes simultaneously. Here, we formulate the problem of simultaneously designing molecules with the desired set of properties and their synthetic routes within the framework of Bayesian inference. The design variables consist of a set of reactants in a reaction network and its network topology. The design space is extremely large because it consists of all combinations of purchasable reactants, often in the order of millions or more. In addition, the designed reaction networks can adopt any topology beyond simple multistep linear reaction routes. To solve this hard combinatorial problem, we present a powerful sequential Monte Carlo algorithm that recursively designs a synthetic reaction network by sequentially building up single-step reactions. In a case study of designing drug-like molecules based on commercially available compounds, compared with heuristic combinatorial search methods, the proposed method shows overwhelming performance in terms of computational efficiency and coverage and novelty with respect to existing compounds.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源