分子多模式基础模型将分子图与自然语言相关联

论文标题

分子多模式基础模型将分子图与自然语言相关联

A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language

论文作者

Su, Bing, Du, Dazhao, Yang, Zhao, Zhou, Yujie, Li, Jiangmeng, Rao, Anyi, Sun, Hao, Lu, Zhiwu, Wen, Ji-Rong

论文摘要

尽管人工智能（AI）在理解各个领域的分子方面取得了重大进展，但现有模型通常从单个分子模态中获得单个认知能力。由于分子知识的层次结构是深刻的，即使人类也从不同的方式中学习，包括直觉图和专业文本，以帮助他们的理解。受此启发，我们提出了一个分子多模式模型，该模型是从分子图及其语义相关的文本数据（从已发表的科学引用索引论文中爬行）的。该AI模型代表了直接桥接分子图和自然语言的关键尝试。重要的是，通过捕获两种方式的特定和互补信息，我们提出的模型可以更好地掌握分子专业知识。实验结果表明，我们的模型不仅在诸如跨模式检索和分子标题之类的跨模式任务中表现出有希望的性能，而且还可以增强分子属性预测，并具有从自然语言描述中产生有意义的分子图的能力。我们认为，我们的模型将对跨生物学，化学，材料，环境和医学等学科的AI能力领域产生广泛的影响。

Although artificial intelligence (AI) has made significant progress in understanding molecules in a wide range of fields, existing models generally acquire the single cognitive ability from the single molecular modality. Since the hierarchy of molecular knowledge is profound, even humans learn from different modalities including both intuitive diagrams and professional texts to assist their understanding. Inspired by this, we propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data (crawled from published Scientific Citation Index papers) via contrastive learning. This AI model represents a critical attempt that directly bridges molecular graphs and natural language. Importantly, through capturing the specific and complementary information of the two modalities, our proposed model can better grasp molecular expertise. Experimental results show that our model not only exhibits promising performance in cross-modal tasks such as cross-modal retrieval and molecule caption, but also enhances molecular property prediction and possesses capability to generate meaningful molecular graphs from natural language descriptions. We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine, among others.

下载PDF全文

下载文献需遵守相关版权规定

论文标题