论文标题
Geotyper:从RAW SCRNA-SEQ数据到单元类型识别的自动管道
GeoTyper: Automated Pipeline from Raw scRNA-Seq Data to Cell Type Identification
论文作者
论文摘要
肿瘤微环境的细胞组成可以直接影响癌症的进展和治疗疗法的功效。在癌细胞附近了解免疫细胞活性是人体的自然防御机制,对于开发有益的治疗至关重要。单细胞RNA测序(SCRNA-SEQ)可以在单个细胞的基础上检查基因表达,从而提供有关癌症微环境中由癌症和细胞细胞通信引起的细胞功能障碍的关键信息。这种新颖的技术会生成大量数据,这些数据需要适当的处理。存在各种工具来促进这种处理,但需要组织起来,以标准化从数据悬垂到可视化,细胞类型识别以及细胞活性变化的工作流程,这是从恶性细胞的角度和消除它们消除它们的免疫基质细胞的角度。我们旨在开发标准化管道(Geotyper,https://github.com/celineyayayifeng/geotyper),该管道集成了多个用于处理从NCBI GEO中提取的原始序列数据的SCRNA-SEQ工具,可视化结果,统计分析,统计分析和细胞类型识别。该管道利用现有工具,例如从10x基因组学,Alevin和Seurat的CellRanger来群集细胞,并根据基因表达谱识别细胞类型。我们成功地测试并验证了几个公开可用的SCRNA-SEQ数据集的管道,从而导致与不同的单元格类型相对应的簇。通过确定多种癌症的肿瘤微环境中的细胞类型及其各自的频率,该工作流将有助于量化与细胞 - 细胞通信相关的基因表达变化并确定可能的治疗靶标。
The cellular composition of the tumor microenvironment can directly impact cancer progression and the efficacy of therapeutics. Understanding immune cell activity, the body's natural defense mechanism, in the vicinity of cancerous cells is essential for developing beneficial treatments. Single cell RNA sequencing (scRNA-seq) enables the examination of gene expression on an individual cell basis, providing crucial information regarding both the disturbances in cell functioning caused by cancer and cell-cell communication in the tumor microenvironment. This novel technique generates large amounts of data, which require proper processing. Various tools exist to facilitate this processing but need to be organized to standardize the workflow from data wrangling to visualization, cell type identification, and analysis of changes in cellular activity, both from the standpoint of malignant cells and immune stromal cells that eliminate them. We aimed to develop a standardized pipeline (GeoTyper, https://github.com/celineyayifeng/GeoTyper) that integrates multiple scRNA-seq tools for processing raw sequence data extracted from NCBI GEO, visualization of results, statistical analysis, and cell type identification. This pipeline leverages existing tools, such as Cellranger from 10X Genomics, Alevin, and Seurat, to cluster cells and identify cell types based on gene expression profiles. We successfully tested and validated the pipeline on several publicly available scRNA-seq datasets, resulting in clusters corresponding to distinct cell types. By determining the cell types and their respective frequencies in the tumor microenvironment across multiple cancers, this workflow will help quantify changes in gene expression related to cell-cell communication and identify possible therapeutic targets.