计算机科学 ›› 2023, Vol. 50 ›› Issue (12): 104-112.doi: 10.11896/jsjkx.221000167
孔凤玲, 吴昊, 董庆庆
KONG Fengling, WU Hao, DONG Qingqing
摘要: 单细胞数据聚类在生物信息分析中具有重要作用,但受测序原理和测序平台的限制,单细胞数据集普遍存在高维稀疏性、高方差噪声和基因数据缺失的问题,导致单细胞数据在聚类分析和应用方面仍面临诸多挑战。现有的单细胞聚类方法主要针对细胞和基因表达间的关系进行建模,忽略了对细胞间潜在特征关系的充分挖掘以及对噪声的去除,导致聚类结果不理想,从而阻碍了后期对数据的分析。针对上述问题,提出了一种联合零膨胀负二项(Zero Inflated Negative Binomial,ZINB)模型与图注意力自编码器的自优化单细胞聚类算法(Self-optimized Single Cell Clustering Using ZINB Model and Graph Attention Autoencoder,scZDGAC)。该算法首先使用ZINB模型并结合可扩展的DCA去噪算法,通过ZINB分布更好地拟合数据特征分布,提升自编码器的去噪性能,并减小噪声和数据丢失对KNN算法输出的影响;然后通过图注意力自编码器在不同权重的细胞之间传播信息,更好地捕获细胞间的潜在特征进行聚类;最后scZDGAC采用自优化的方法使原本两个独立的聚类模块和特征模块相互受益,不断迭代更新聚类中心,进一步提升聚类性能。为了对聚类结果进行评价,文中使用调整兰德指数(ARI)和标准化互信息(NMI)两个通用评价指标。在6个不同规模的单细胞数据集上与其他算法进行对比实验,结果表明,所提聚类算法在聚类性能上较其他方法有很大提高,很好地展现了该算法的鲁棒性。
中图分类号:
[1]HWANG B,LEE J H,BANG D.Single-cell RNA sequencingtechnologies and bioinformatics pipelines[J].Experimental & Molecular Medicine,2018,50(8):1-14. [2]GUO M,DU Y,GOKEY J J,et al.Single cell RNA analysisidentifies cellular heterogeneity and adaptive responses of the lung at birth[J].Nature Communications,2019,10(1):1-16. [3]HU H,LI Z,LI X,et al.ScCAEs:deep clustering of single-cell RNA-seq via convolutional autoencoder embedding and soft K-means[J].Briefings in Bioinformatics,2022,23(1):bbab321. [4]MACOSKO E Z,BASU A,SATIJA R,et al.Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nano-liter Droplets[J].Cell,2015,161(5):1202-1214. [5]ANGERER P,SIMON L,TRITSCHLER S,et al.Single cellsmake big data:New challenges and opportunities in transcriptomics[J].Current Opinion in Systems Biology,2017,4:85-91. [6]WANG B,ZHU J,PIERSON E,et al.Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning[J].Nature Methods,2017,14(4):414-416. [7]SATIJA R,FARRELL J A,GENNERT D,et al.Spatial recon-struction of single-cell gene expression data[J].Nature Biotechnology,2015,33(5):495-502. [8]LIN P,TROUP M,HO J W K.CIDR:Ultrafast and accurate clustering through imputation for single-cell RNA-seq data[J].Genome Biology,2017,18(1):1-11. [9]MEI Q,LI G,SU Z.Clustering single-cell RNA-seq data by rankconstrained similarity learning[J].Bioinformatics(Oxford,England),2021,37(19):3235-3242. [10]KISELEV V Y,KIRSCHNER K,SCHAUB M T,et al.SC3:consensus clustering of single-cell RNA-seq data[J].Nature Methods,2017,14(5):483-486. [11]YANG Y,HUH R,CULPEPPER H W,et al.SAFE-clustering:single-cell aggregated(from ensemble) clustering for single-cell RNA-seq data[J].Bioinformatics(Oxford,England),2019,35(8):1269-1277. [12]HU H R,YANG Y,JIANG Y,et al.SAME-clustering:Single-cell Aggregated Clustering via Mixture Model Ensemble[J].Nucleic Acids Research,2020,48(1):86-95. [13]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444. [14]ERASLAN G,AVSEC Ž,GAGNEUR J,et al.Deep learning:new computational modelling techniques for genomics[J].Nature Reviews Genetics,2019,20(7):389-403. [15]HINTON G E,SALAKHUTDINOV R R.Reducing the dimen-sionality of data with neural networks[J].Science(New York),2006,313(5786):504-507. [16]TIAN T,WAN J,SONG Q,et al.Clustering single-cell RNA-seq data with a model-based deep learning approach[J].Nature Machine Intelligence,2019,1(4):191-198. [17]XIE J,GIRSHICK R,FARHADI A.Unsupervised deep embedding for clustering analysis[C]//International Conference on Machine Learning.PMLR,2016:478-487. [18]LI X,WANG K,LYU Y,et al.Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq ana-lysis[J].Nature Communications,2020,11(1):1-14. [19]CHEN L,WANG W,ZHAI Y,et al.Deep soft K-means clustering with self-training for single-cell RNA sequence data[J].NAR Genomics and Bioinformatics,2020,2(2):lqaa039. [20]GAN Y,HUANG X,ZOU G,et al.Deep structural clustering for single-cell RNA-seq data jointly through autoencoder and graph neural network[J].Briefings in Bioinformatics,2022,23(2):bbac018. [21]CHENG Y,MA X.scGAC:a graph attentional architecture for clustering single-cell RNA-seq data[J].Bioinformatics(Oxford,England),2022,38(8):2187-2193. [22]BO D,WANG X,SHI C,et al.Structural deep clustering network[C]//Proceedings of the Web Conference 2020.2020:1400-1410. [23]WANG J,MA A,CHANG Y,et al.scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses[J].Nature Communications,2021,12(1):1-11. [24]KIPF T N,WELLING M.Semi-Supervised Classification withGraph Convolutional Networks[J].arXiv:1609.02907,2016. [25]ERASLAN G,SIMON L M,MIRCEA M,et al.Single-cellRNA-seq denoising using a deep count autoencoder[J].Nature Communications,2019,10(1):1-14. [26]ZHAO J,WANG N,WANG H,et al.SCDRHA:A scRNA-Seq Data Dimensionality Reduction Algorithm Based on Hierarchical Autoencoder[J].Frontiers in Genetics,2021,12:733906. [27]VELICKOVIC P,CUCURULL G,Casanova A,et al.Graph attention networks[J].arXiv:1710.10903,2017. [28]HARTIGAN J A,WONG M A.Algorithm AS 136:A k-means clustering algorithm[J].Journal of the Royal Statistical Society,Series c(Applied Statistics),1979,28(1):100-108. [29]ROUSSEEUW P J.Silhouettes:A graphical aid to the interpretation and validation of cluster analysis[J].Journal of Computational and Applied Mathematics,1987,20:53-65. [30]LOPEZ R,REGIER J,COLE M B,et al.Deep generative mode-ling for single-cell transcriptomics[J].Nature Methods,2018,15(12):1053-1058. [31]VAN DER MAATEN L,HINTON G.Visualizing data using t-SNE[J].Journal of machine learning research,2008,9(11):2579-2605. [32]TANG Y W.Research on an adaptive clustering Algorithmbased on K-Means[J].Science and Technology Wealth Guide,2012(2):143-143. [33]HUBERT L,ARABIE P.Comparing partitions[J].Journal ofClassification,1985,2(1):193-218. [34]STREHL A,GHOSH J.Cluster Ensembles-A Knowledge Reuse Framework for Combining Multiple Partitions[J].Journal of Machine Learning Research,2002,3(Dec):583-617. |
|