用于蛋白质分类的图形神经网络和基于注意的CNN-LSTM

论文标题

用于蛋白质分类的图形神经网络和基于注意的CNN-LSTM

Graph neural networks and attention-based CNN-LSTM for protein classification

论文作者

Shi, Zhuangwei, Li, Bo

论文摘要

本文侧重于蛋白质分类的三个关键问题。首先，碳水化合物活性酶（Cazyme）分类可以帮助人们了解酶的特性。但是，一个Cazyme可能属于几个类。这导致多标签的cazyme分类。其次，为了捕获蛋白质二级结构的信息，将蛋白质分类建模为图形分类问题。第三，化合物蛋白质相互作用的预测采用了与蛋白质顺序嵌入化合物的图。这可以看作是化合物蛋白对的分类任务。本文提出了三个用于蛋白质分类的模型。首先，本文提出了使用带有注意机制的CNN-LSTM的多标签Cazyme分类模型。其次，本文提出了一个基于蛋白质图分类的基于变异图自动编码器的子空间学习模型。第三，本文提出了用于复合 - 蛋白质相互作用预测的图形同构网络（GIN）和基于注意力的CNN-LSTM，并将GIN与图形卷积网络（GCN）和图形注意力网络（GAT）进行比较。所提出的模型对蛋白质分类有效。源代码和数据可从https://github.com/zshicode/gnn-attcl-protein获得。此外，该存储库还收集并整理了基准数据集，包括上述问题，包括Cazyme分类，酶蛋白图分类，化合物 - 蛋白质相互作用预测，药物目标亲和力预测和药物 - 药物相互作用的预测。因此，基准数据集评估的用法可以更方便。

This paper focuses on three critical problems on protein classification. Firstly, Carbohydrate-active enzyme (CAZyme) classification can help people to understand the properties of enzymes. However, one CAZyme may belong to several classes. This leads to Multi-label CAZyme classification. Secondly, to capture information from the secondary structure of protein, protein classification is modeled as graph classification problem. Thirdly, compound-protein interactions prediction employs graph learning for compound with sequential embedding for protein. This can be seen as classification task for compound-protein pairs. This paper proposes three models for protein classification. Firstly, this paper proposes a Multi-label CAZyme classification model using CNN-LSTM with Attention mechanism. Secondly, this paper proposes a variational graph autoencoder based subspace learning model for protein graph classification. Thirdly, this paper proposes graph isomorphism networks (GIN) and Attention-based CNN-LSTM for compound-protein interactions prediction, as well as comparing GIN with graph convolution networks (GCN) and graph attention networks (GAT) in this task. The proposed models are effective for protein classification. Source code and data are available at https://github.com/zshicode/GNN-AttCL-protein. Besides, this repository collects and collates the benchmark datasets with respect to above problems, including CAZyme classification, enzyme protein graph classification, compound-protein interactions prediction, drug-target affinities prediction and drug-drug interactions prediction. Hence, the usage for evaluation by benchmark datasets can be more conveniently.

下载PDF全文

下载文献需遵守相关版权规定

论文标题