论文标题
通过半监督学习从单细胞转录组数据中识别细胞类型
Cell Type Identification from Single-Cell Transcriptomic Data via Semi-supervised Learning
论文作者
论文摘要
来自单细胞转录组数据的细胞类型识别是单细胞RNA测序(SCRNASEQ)数据分析的共同目标。已使用神经网络从SCRNASEQ数据中识别出高性能的细胞类型。但是,它需要大量的单个单元格,具有准确且无偏的注释类型来构建标识模型。不幸的是,标记scrnaseq数据很麻烦且耗时,因为它涉及对标记基因的手动检查。为了克服这一挑战,我们建议使用半监督的学习模型使用未标记的SCRNASEQ细胞和有限的标记的Scrnaseq细胞来实现细胞识别。首先,我们将scrnaseq细胞转换为“基因句子”,该细胞的灵感来自自然语言系统和基因系统之间的相似之处。那么这些句子中的基因被表示为基因嵌入,以减少数据稀疏性。借助这些嵌入,我们基于经常性卷积神经网络(RCNN)实施了半监督的学习模型,该模型包括共享网络,有监督的网络和无监督的网络。提出的模型是在Macosko2015上评估的,Macosko2015是一个具有单个细胞类型的地面真相的大型单细胞转录数据集。据观察,所提出的模型能够通过学习非常有限的标记的Scrnaseq细胞以及大量未标记的SCRNASEQ细胞来实现令人鼓舞的性能。
Cell type identification from single-cell transcriptomic data is a common goal of single-cell RNA sequencing (scRNAseq) data analysis. Neural networks have been employed to identify cell types from scRNAseq data with high performance. However, it requires a large mount of individual cells with accurate and unbiased annotated types to build the identification models. Unfortunately, labeling the scRNAseq data is cumbersome and time-consuming as it involves manual inspection of marker genes. To overcome this challenge, we propose a semi-supervised learning model to use unlabeled scRNAseq cells and limited amount of labeled scRNAseq cells to implement cell identification. Firstly, we transform the scRNAseq cells to "gene sentences", which is inspired by similarities between natural language system and gene system. Then genes in these sentences are represented as gene embeddings to reduce data sparsity. With these embeddings, we implement a semi-supervised learning model based on recurrent convolutional neural networks (RCNN), which includes a shared network, a supervised network and an unsupervised network. The proposed model is evaluated on macosko2015, a large scale single-cell transcriptomic dataset with ground truth of individual cell types. It is observed that the proposed model is able to achieve encouraging performance by learning on very limited amount of labeled scRNAseq cells together with a large number of unlabeled scRNAseq cells.