计算机科学 ›› 2025, Vol. 52 ›› Issue (5): 161-170.doi: 10.11896/jsjkx.240300110
李志杰1, 廖旭红1, 李青蓝2, 刘丽3
LI Zhijie1, LIAO Xuhong1, LI Qinglan2, LIU Li3
摘要: 癌症是人类健康的第一杀手。随着测序技术的快速发展,积累了海量的癌症基因表达数据,利用计算方法进行致病基因预测成为癌症研究领域新的热点。然而,目前致病基因预测大多基于基因相互作用网络等,很少考虑网络局部连接与基因差异表达间的潜在联系。针对上述问题,首先利用患病前后的基因表达差异数据,通过互信息计算基因间的相关性并构建邻接网络,然后设计特征向量模型用于癌症致病基因预测。向量特征包括候选基因及其近邻的差异表达信息。从TCGA,OMIM和GEO等公共数据库获取癌症相关的致病与非致病基因以及患病前后基因差异表达数据进行实验,利用邻接网络中基因及其近邻的差异表达信息进行癌症致病基因预测(Differential Information of Gene and Nearest Neighbor for Cancer Pathogenic Gene Prediction,DICPG)。实验结果表明,DICPG癌症基因分类模型的生物学意义明显,分类精度和AUC等性能指标优于同类方法。
中图分类号:
[1]YANG S F,CHANG C W,WEI R J,et al.Involvement of DNA Damage Response Pathways in Hepatocellular Carcinoma[J].BioMed Research International,2014,16:283-291. [2]ZHANG X,ZOU Q,RODRIGUEZ-PATON A,et al.Meta-Path Methods for Prioritizing Candidate Disease MiRNAs[J].IEEE ACM Trans.Comput.Biol.Bioinf.,2019,16:283-291. [3]LI X,CHANG M,WANG L.Information Recognition of Pathogenic Modules in Gene Statistics of Big Data[J].Nanomater Energy,2021,10:35-42. [4]COLLIER O,STOVEN V,VERT J P.LOTUS:a Single andMultitask Machine Learning Algorithm for The Prediction of Cancer Driver Genes[J].PLoS.Comput.Biol.,2019,15:100-108. [5]LUO P,DING Y,LEI X,et al.deepDriveer:Predicting Cancer Driver Genes Based on Somatic Mutations Using Deep Convolutional Neural Networks[J].Front Genet.,2019,15:12-19. [6]LIU X,TANG W H,ZHAO X M,et al.A Network Approach to Predict Pathogenic Genes for Fusarium Graminearum[J].PLoS ONE,2010,5:e13021. [7]BOLDI P,SANTINI M,VIGNA S.PageRank as A Function ofThe Damping Factor[C]//Proceedings of The 14th InternationalConference on World Wide Web.2005:557-566. [8]CHAKRABARTI S,DOM B E,KUMAR S R,et al.Mining The Web's Link Structure[J].Computer,1999,32(8):60-67. [9]NEWMAN M E J.Modularity and Community Structure in Networks[J].National Academy of Sciences,2006,103(23):8577-8582. [10]ZITNIK M,SOSIC R,LESKOVEC J.Prioritizing NetworkCommunities [J].Nature Communications,2018,9(1):1-9. [11]SHANG H X,LIU Z P.Prioritizing Type 2 Diabetes Genes by Weighted PageRank on Bilayer Heterogeneous Networks[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2021,18(1):336-346. [12]PONTES B,GIRALDEZ R,AGUILAR-RUIZ J S.Biclusteringon Expression Data:A Review[J].Journal of Biomedical Informatics,2015,57(6):163-180. [13]CHENG L,YANG H,ZHAO H,et al.MetSigDis:A Mamually Curated Resource for The Metabolic Signatures of Diseases[J].Briefings BioInf.,2019,20:203-209. [14]POTTINGER T D,PUCKELWARTZ M J,PESCE L L,et al.Pathogenic and Uncertain Genetic Variants Have Clinical Cardiac Correlates in Diverse Biobank Participants[J].J.Am.Heart.Assoc.,2020,9:26. [15]ZOU Y,HUI R,SONG L.The Era of Clinical Application of Gene Diagnosis in Cardiovascular Diseases Is Coming[J].Chronic.Dis.Transl.Med.,2019,5:214-220. [16]TIMILSINA M,YANG H,SAHAY R,et al.Predicting LinksBetween Tumor Samples and Using 2-Layered Graph Based Diffusion Approach[J].BMC Bioinf.,2019,20:1-20. [17]XU B,LIU Y,YU S,et al.A Network Embedding Model for Pathogenic Genes Prediction by Multi-Path Random Walking on Heterogeneous Network[J].BMC Med Genomics,2019,12:188. [18]ZHANG H P,WANG H N,LU G M,et al.Finding Differentially Co-Expressed Disease-Related Genes Based on Mutual Information[J].Journal of Southeast University(Natural Science Edition),2009,39:151-155. [19]YU L,REN S J.Prediction of Cancerous Pathogenic GenesBased on Network and Gene Differential Expression Information[J].Scientia Sinica Vitae,2023,53(1):94-108. [20]SHANNON C E.A Mathematical Theory of Communication[J].The Bell System Technical Journal,1948,27:379-423. [21]WANG L,CHEN P,CHEN S,et al.A Novel Approach to Fully Representing The Diversity in Conditional Dependencies for Learning Bayesian Network Classifier[J].Intelligent Data Ana-lysis,2021,25(11):35-55. [22]DUAN Z,WANG L,CHEN S,et al.Instance-Based Weighting Filter for Superparent One-Dependence Estimators[J].Know-ledge-Based Systems,2020,203(8):106-115. [23]CABUZ S,ABREU G.Causal Inference for Multivariate Stochastic Process Prediction[J].Information Sciences,2018,448(12):134-148. [24]SUN J,TAYLOR D,BOLLT E M.Causal Network Inference by Optimal Causation Entropy[J].SIAM Journal on Applied Dynamical Systems,2015,14(3):73-106. [25]CHUA H N,SUNG W K,WONG L.Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein—Protein Interactions[J].Bioinformatics,2006,22(13):1623-1630. [26]SHAIK J S,YEASIN M.A Unified Framework for Finding Differentially Expressed Genes from Microarray Experiments[J].BMC Bioinformatics,2007,8:347. [27]LI X,RAO S,WANG Y,et al.Gene Mining:A Novel And Power-ful Ensemble Decision Approach to Hunting for Genes Using Microarray Expression Profiling[J].Nucleic Acids Research,2004,32(9):2685-2694. [28]DIAO Q,HU W,ZHONG H,et al.Disease Gene Explorer:Display Disease Gene Dependency by Combining Bayesian Networks with Clustering[C]//Proceedings of The IEEE Computational Systems Bioinformatics Conference.Stanford,USA,2004:574-575. [29]ZHANG X W,YAP Y L,WEI D,et al.Molecular Diagnosis of Human Cancer Type by Gene Expression Profiles and Indepen-dent Component Analysis[J].European Journal of Human Genetics,2005,13(12):1303-1311. |
|