分子属性预测的3D图对比度学习

论文标题

分子属性预测的3D图对比度学习

3D Graph Contrastive Learning for Molecular Property Prediction

论文作者

Moon, Kisung, Kwon, Sunyoung

论文摘要

自我监督学习（SSL）是一种通过利用数据中固有的监督来学习数据表示的方法。这种学习方法是药物领域的焦点，由于耗时且昂贵的实验，缺乏带注释的数据。使用巨大未标记数据的SSL显示出在分子属性预测方面表现出色的性能，但存在一些问题。（1）现有的SSL模型是大规模的；在计算资源不足的地方实现SSL有限制。（2）在大多数情况下，它们不利用3D结构信息进行分子表示学习。药物的活性与药物分子的结构密切相关。但是，大多数当前模型不使用3D信息或部分使用它。（3）将对比度学习应用于分子的先前模型使用置换原子和键的增强。因此，具有不同特征的分子可以在相同的阳性样品中。我们提出了一个新颖的对比学习框架，用于分子性质预测的小规模3D图对比度学习（3DGCL），以解决上述问题。 3DGCL通过不改变药物语义的前训练过程来反映分子的结构来学习分子表示。仅使用1,128个样本用于培训数据和100万个模型参数，我们在四个回归基准数据集中实现了最先进或可比性的性能。广泛的实验表明，基于化学知识的3D结构信息对于分子代表学习的财产预测至关重要。

Self-supervised learning (SSL) is a method that learns the data representation by utilizing supervision inherent in the data. This learning method is in the spotlight in the drug field, lacking annotated data due to time-consuming and expensive experiments. SSL using enormous unlabeled data has shown excellent performance for molecular property prediction, but a few issues exist. (1) Existing SSL models are large-scale; there is a limitation to implementing SSL where the computing resource is insufficient. (2) In most cases, they do not utilize 3D structural information for molecular representation learning. The activity of a drug is closely related to the structure of the drug molecule. Nevertheless, most current models do not use 3D information or use it partially. (3) Previous models that apply contrastive learning to molecules use the augmentation of permuting atoms and bonds. Therefore, molecules having different characteristics can be in the same positive samples. We propose a novel contrastive learning framework, small-scale 3D Graph Contrastive Learning (3DGCL) for molecular property prediction, to solve the above problems. 3DGCL learns the molecular representation by reflecting the molecule's structure through the pre-training process that does not change the semantics of the drug. Using only 1,128 samples for pre-train data and 1 million model parameters, we achieved the state-of-the-art or comparable performance in four regression benchmark datasets. Extensive experiments demonstrate that 3D structural information based on chemical knowledge is essential to molecular representation learning for property prediction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题