Computer Science ›› 2023, Vol. 50 ›› Issue (12): 104-112.doi: 10.11896/jsjkx.221000167

• Database & Big Data & Data Science • Previous Articles     Next Articles

Self-optimized Single Cell Clustering Using ZINB Model and Graph Attention Autoencoder

KONG Fengling, WU Hao, DONG Qingqing   

  1. School of Information Science and Engineering,Yunnan University,Kunming 650500,China
  • Received:2022-10-21 Revised:2023-03-14 Online:2023-12-15 Published:2023-12-11
  • About author:KONG Fengling,born in 1997,postgra-duate,is a student member of China Computer Federation.Her main research interests include biological information technology,image processing,etc.
    WU Hao,born in 1982,Ph.D,lecturer,is a senior member of China Computer Federation.His main research interests include image processing,computer vision and bioinformatics analysis,etc.
  • Supported by:
    National Natural Science Foundation of China(62061049) and Yunnan Fundamental Research Projects(2018FB100).

Abstract: One of the most important aspects of single-cell data analysis is the clustering of individual cells into clusters of subpopulations.However,due to the limitation of sequencing principle and sequencing platform,the obtained single cell dataset ge-nerally has high-dimensional sparsity,high variance noise and a large amount of data loss,which lead to many challenges in cluster analysis and application of single cell data.Single-cell clustering methods proposed in recent years mainly model the relationship between cell and gene expression,ignoring the full mining of the potential characteristic relationship between cells and the remo-val of noise,resulting in unsatisfactory clustering results,which hinders the later analysis of data.In view of the above problems,a self-optimized single-cell clustering algorithm(scZDGAC) combining zero expansion negative binomial(ZINB) model with graph attention autoencoder is proposed.The algorithm firstly uses ZINB model combined with extensible DCA denoising algorithm,better fit data feature distribution through ZINB distribution,to improve the denoising performance of autoencoder,and reduce the impact of noise and data loss on the output of KNN algorithm.And then using the graph attention autoencoder to spread the information between cells of different weights,which can better capture the potential features between cells for clustering.Finally,scZDGAC uses the self-optimization method to make the originally two independent clustering modules and feature modules benefit from each other,and constantly update the clustering center iteratively to further improve the clustering performance.In order to evaluate the clustering results,this paper uses adjusted RAND index(ARI) and standardized mutual information(NMI) as two general evaluation indicators.Compared with six single cell datasets of different scales,experimental results show that the clustering performance of the proposed clustering algorithm has greatly improved.

Key words: Deep clustering, scRNA-seq, ZINB model, Self-optimization, DCA, Graph attention autoencoder

CLC Number: 

  • Q811.4
[1]HWANG B,LEE J H,BANG D.Single-cell RNA sequencingtechnologies and bioinformatics pipelines[J].Experimental & Molecular Medicine,2018,50(8):1-14.
[2]GUO M,DU Y,GOKEY J J,et al.Single cell RNA analysisidentifies cellular heterogeneity and adaptive responses of the lung at birth[J].Nature Communications,2019,10(1):1-16.
[3]HU H,LI Z,LI X,et al.ScCAEs:deep clustering of single-cell RNA-seq via convolutional autoencoder embedding and soft K-means[J].Briefings in Bioinformatics,2022,23(1):bbab321.
[4]MACOSKO E Z,BASU A,SATIJA R,et al.Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nano-liter Droplets[J].Cell,2015,161(5):1202-1214.
[5]ANGERER P,SIMON L,TRITSCHLER S,et al.Single cellsmake big data:New challenges and opportunities in transcriptomics[J].Current Opinion in Systems Biology,2017,4:85-91.
[6]WANG B,ZHU J,PIERSON E,et al.Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning[J].Nature Methods,2017,14(4):414-416.
[7]SATIJA R,FARRELL J A,GENNERT D,et al.Spatial recon-struction of single-cell gene expression data[J].Nature Biotechnology,2015,33(5):495-502.
[8]LIN P,TROUP M,HO J W K.CIDR:Ultrafast and accurate clustering through imputation for single-cell RNA-seq data[J].Genome Biology,2017,18(1):1-11.
[9]MEI Q,LI G,SU Z.Clustering single-cell RNA-seq data by rankconstrained similarity learning[J].Bioinformatics(Oxford,England),2021,37(19):3235-3242.
[10]KISELEV V Y,KIRSCHNER K,SCHAUB M T,et al.SC3:consensus clustering of single-cell RNA-seq data[J].Nature Methods,2017,14(5):483-486.
[11]YANG Y,HUH R,CULPEPPER H W,et al.SAFE-clustering:single-cell aggregated(from ensemble) clustering for single-cell RNA-seq data[J].Bioinformatics(Oxford,England),2019,35(8):1269-1277.
[12]HU H R,YANG Y,JIANG Y,et al.SAME-clustering:Single-cell Aggregated Clustering via Mixture Model Ensemble[J].Nucleic Acids Research,2020,48(1):86-95.
[13]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[14]ERASLAN G,AVSEC Ž,GAGNEUR J,et al.Deep learning:new computational modelling techniques for genomics[J].Nature Reviews Genetics,2019,20(7):389-403.
[15]HINTON G E,SALAKHUTDINOV R R.Reducing the dimen-sionality of data with neural networks[J].Science(New York),2006,313(5786):504-507.
[16]TIAN T,WAN J,SONG Q,et al.Clustering single-cell RNA-seq data with a model-based deep learning approach[J].Nature Machine Intelligence,2019,1(4):191-198.
[17]XIE J,GIRSHICK R,FARHADI A.Unsupervised deep embedding for clustering analysis[C]//International Conference on Machine Learning.PMLR,2016:478-487.
[18]LI X,WANG K,LYU Y,et al.Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq ana-lysis[J].Nature Communications,2020,11(1):1-14.
[19]CHEN L,WANG W,ZHAI Y,et al.Deep soft K-means clustering with self-training for single-cell RNA sequence data[J].NAR Genomics and Bioinformatics,2020,2(2):lqaa039.
[20]GAN Y,HUANG X,ZOU G,et al.Deep structural clustering for single-cell RNA-seq data jointly through autoencoder and graph neural network[J].Briefings in Bioinformatics,2022,23(2):bbac018.
[21]CHENG Y,MA X.scGAC:a graph attentional architecture for clustering single-cell RNA-seq data[J].Bioinformatics(Oxford,England),2022,38(8):2187-2193.
[22]BO D,WANG X,SHI C,et al.Structural deep clustering network[C]//Proceedings of the Web Conference 2020.2020:1400-1410.
[23]WANG J,MA A,CHANG Y,et al.scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses[J].Nature Communications,2021,12(1):1-11.
[24]KIPF T N,WELLING M.Semi-Supervised Classification withGraph Convolutional Networks[J].arXiv:1609.02907,2016.
[25]ERASLAN G,SIMON L M,MIRCEA M,et al.Single-cellRNA-seq denoising using a deep count autoencoder[J].Nature Communications,2019,10(1):1-14.
[26]ZHAO J,WANG N,WANG H,et al.SCDRHA:A scRNA-Seq Data Dimensionality Reduction Algorithm Based on Hierarchical Autoencoder[J].Frontiers in Genetics,2021,12:733906.
[27]VELICKOVIC P,CUCURULL G,Casanova A,et al.Graph attention networks[J].arXiv:1710.10903,2017.
[28]HARTIGAN J A,WONG M A.Algorithm AS 136:A k-means clustering algorithm[J].Journal of the Royal Statistical Society,Series c(Applied Statistics),1979,28(1):100-108.
[29]ROUSSEEUW P J.Silhouettes:A graphical aid to the interpretation and validation of cluster analysis[J].Journal of Computational and Applied Mathematics,1987,20:53-65.
[30]LOPEZ R,REGIER J,COLE M B,et al.Deep generative mode-ling for single-cell transcriptomics[J].Nature Methods,2018,15(12):1053-1058.
[31]VAN DER MAATEN L,HINTON G.Visualizing data using t-SNE[J].Journal of machine learning research,2008,9(11):2579-2605.
[32]TANG Y W.Research on an adaptive clustering Algorithmbased on K-Means[J].Science and Technology Wealth Guide,2012(2):143-143.
[33]HUBERT L,ARABIE P.Comparing partitions[J].Journal ofClassification,1985,2(1):193-218.
[34]STREHL A,GHOSH J.Cluster Ensembles-A Knowledge Reuse Framework for Combining Multiple Partitions[J].Journal of Machine Learning Research,2002,3(Dec):583-617.
[1] CAI Shaotian, CHEN Xiaojun, CHEN Longteng, QIU Liping. Stratified Pseudo-label Based Image Clustering [J]. Computer Science, 2023, 50(6): 225-235.
[2] HE Wenhao, WU Chunjiang, ZHOU Shijie, HE Chaoxin. Study on Short Text Clustering with Unsupervised SimCSE [J]. Computer Science, 2023, 50(11): 71-76.
[3] FANG Ting, GONG Ao-yu, ZHANG Fan, LIN Yan, JIA Lin-qiong, ZHANG Yi-jin. Dynamic Broadcasting Strategy in Cognitive Radio Networks Under Delivery Deadline [J]. Computer Science, 2021, 48(7): 340-346.
[4] XIA Si-yang, WU Qiong, NI Yuan-zhi, WU Gui-lu, LI Zheng-quan. Performance Analysis Model of 802.11p Based Platooning Communication at Traffic Intersection [J]. Computer Science, 2021, 48(5): 254-262.
[5] KANG Yan, KOU Yong-qi, XIE Si-yu, WANG Fei, ZHANG Lan, WU Zhi-wei, LI Hao. Deep Clustering Model Based on Fusion Variational Graph Attention Self-encoder [J]. Computer Science, 2021, 48(11A): 81-87.
[6] YU Zhen-chao, LIU Feng, ZENG Lian-sun. Multi-user Network Analysis of BC Unicast and BC Multicast Coexistence [J]. Computer Science, 2018, 45(10): 120-123.
[7] QIN Kuang-yu, HUANG Chuan-he, LIU Ke-wei, SHI Jiao-li and CHEN Xi. Multipath Routing Algorithm in Software Defined Networking Based on Multipath Broadcast Tree [J]. Computer Science, 2018, 45(1): 211-215.
[8] TAN Chao-dong, MIN Fan, WU Xiao and LI Xin-lun. Pattern Matching with Weak-wildcard in Application of Time Series Analysis [J]. Computer Science, 2018, 45(1): 103-107.
[9] LI Lei, JIA Hui-wen, BAN Xue-hua and HE Yu-fan. Obfuscation-based Broadcasting Multi-signature Scheme [J]. Computer Science, 2017, 44(Z11): 329-333.
[10] WANG Hao, WANG Hai-ping and WU Xin-dong. Models for Pattern Matching with Wildcards and Length Constraints [J]. Computer Science, 2016, 43(4): 279-283.
[11] QI Fa-zhi, ZHANG Hong-mei, ZHANG Han-wen, SUN Zhi-hui, ZENG Shan and XIA Ming-shan. Queue Length Based Per-hop AC Self Adaptation for MANET [J]. Computer Science, 2016, 43(3): 84-88.
[12] YAO Hong-liang, HUANG Man, WANG Hao and LI Jun-zhao. Trend Forecast of Stock Price Based on Deviated Characteristics and Risk Preference [J]. Computer Science, 2016, 43(3): 38-43.
[13] WANG Hai-ping, DAI Wei and GUO Dan. Dynamic Pruning Algorithm for Pattern Matching with Wildcards and Length Constraints [J]. Computer Science, 2015, 42(4): 244-248.
[14] XIANG Tai-ning,GUO Dan,WANG Hai-ping and HU Xue-gang. Characteristic Analysis of Pattern Matching with Wildcards Problem and its Solution Space [J]. Computer Science, 2014, 41(9): 269-273.
[15] SU Jia-jun and WANG Xin-mei. New Three-level Symmetry Scheme of Traitor Tracing [J]. Computer Science, 2013, 40(8): 96-99.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!