计算机科学 ›› 2025, Vol. 52 ›› Issue (5): 161-170.doi: 10.11896/jsjkx.240300110

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于差异共表达邻接网络的癌症致病基因预测算法

李志杰1, 廖旭红1, 李青蓝2, 刘丽3   

  1. 1 湖南理工学院信息科学与工程学院 湖南 岳阳 414006
    2 宾夕法尼亚大学医学院 费城 19019
    3 弗吉尼亚联邦大学医学院 里士满 23284
  • 收稿日期:2024-03-18 修回日期:2024-07-29 出版日期:2025-05-15 发布日期:2025-05-12
  • 通讯作者: 廖旭红(lxh2402@163.com)
  • 作者简介:(lzj0019@163.com)
  • 基金资助:
    国家自然科学基金(62072475,61672391);湖南省自然科学基金(2019JJ40111)

Cancer Pathogenic Gene Prediction Based on Differential Co-expression Adjacent Network

LI Zhijie1, LIAO Xuhong1, LI Qinglan2, LIU Li3   

  1. 1 School of Information Science and Engineering,Hunan Institute of Science and Technology,Yueyang,Hunan 414006,China
    2 Medical College,University of Pennsylvania,Philadelphia 19019,USA
    3 Medical College,Virginia Commonwealth University,Richmond 23284,USA
  • Received:2024-03-18 Revised:2024-07-29 Online:2025-05-15 Published:2025-05-12
  • About author:LI Zhijie,born in 1964,Ph.D,associate professor.His main research interests include computational biology,online learning of big data,and data mining.
    LIAO Xuhong,born in 1997,master.Her main research interests include computational biology and data mining.
  • Supported by:
    National Natural Science Foundation of China(62072475,61672391) and Hunan Provincial Natural Science Foundation,China(2019JJ40111).

摘要: 癌症是人类健康的第一杀手。随着测序技术的快速发展,积累了海量的癌症基因表达数据,利用计算方法进行致病基因预测成为癌症研究领域新的热点。然而,目前致病基因预测大多基于基因相互作用网络等,很少考虑网络局部连接与基因差异表达间的潜在联系。针对上述问题,首先利用患病前后的基因表达差异数据,通过互信息计算基因间的相关性并构建邻接网络,然后设计特征向量模型用于癌症致病基因预测。向量特征包括候选基因及其近邻的差异表达信息。从TCGA,OMIM和GEO等公共数据库获取癌症相关的致病与非致病基因以及患病前后基因差异表达数据进行实验,利用邻接网络中基因及其近邻的差异表达信息进行癌症致病基因预测(Differential Information of Gene and Nearest Neighbor for Cancer Pathogenic Gene Prediction,DICPG)。实验结果表明,DICPG癌症基因分类模型的生物学意义明显,分类精度和AUC等性能指标优于同类方法。

关键词: 基因差异表达数据, 邻接网络, 候选基因, 基因特征向量, 癌症致病基因预测

Abstract: Cancer is the first killer of human health.With the rapid development of sequencing technology,a massive amount of cancer gene expression data has been accumulated,and using computational methods to predict pathogenic genes has become a new hotspot in cancer research.However,currently,the prediction of pathogenic genes is mostly based on gene interaction networks,and little consideration is given to the potential connection between local network connections and differential gene expression.In response to the above issues,this paper first utilizes gene expression difference data before and after the disease,calculates the correlation between genes through mutual information,and constructs an adjacency network.Then,a feature vector model is designed for predicting cancer pathogenic genes.Vector features include differential expression information of candidate genes and their neighbors.Cancer-related pathogenic and non pathogenic genes are obtained from public databases such as TCGA,OMIM,and GEO,as well as differential expression data of genes before and after illness,for experiments.Differential expression information of genes and their neighbors in adjacency networks are used for cancer pathogenic gene prediction(DICPG).The experimental results show that the DICPG cancer gene classification model has significant biological significance,and its classification accuracy and AUC performance indicators are superior to similar methods.

Key words: Gene differential expression data, Adjacent network, Candidate gene, Gene feature vector, Cancer pathogenic gene prediction

中图分类号: 

  • TP181
[1]YANG S F,CHANG C W,WEI R J,et al.Involvement of DNA Damage Response Pathways in Hepatocellular Carcinoma[J].BioMed Research International,2014,16:283-291.
[2]ZHANG X,ZOU Q,RODRIGUEZ-PATON A,et al.Meta-Path Methods for Prioritizing Candidate Disease MiRNAs[J].IEEE ACM Trans.Comput.Biol.Bioinf.,2019,16:283-291.
[3]LI X,CHANG M,WANG L.Information Recognition of Pathogenic Modules in Gene Statistics of Big Data[J].Nanomater Energy,2021,10:35-42.
[4]COLLIER O,STOVEN V,VERT J P.LOTUS:a Single andMultitask Machine Learning Algorithm for The Prediction of Cancer Driver Genes[J].PLoS.Comput.Biol.,2019,15:100-108.
[5]LUO P,DING Y,LEI X,et al.deepDriveer:Predicting Cancer Driver Genes Based on Somatic Mutations Using Deep Convolutional Neural Networks[J].Front Genet.,2019,15:12-19.
[6]LIU X,TANG W H,ZHAO X M,et al.A Network Approach to Predict Pathogenic Genes for Fusarium Graminearum[J].PLoS ONE,2010,5:e13021.
[7]BOLDI P,SANTINI M,VIGNA S.PageRank as A Function ofThe Damping Factor[C]//Proceedings of The 14th InternationalConference on World Wide Web.2005:557-566.
[8]CHAKRABARTI S,DOM B E,KUMAR S R,et al.Mining The Web's Link Structure[J].Computer,1999,32(8):60-67.
[9]NEWMAN M E J.Modularity and Community Structure in Networks[J].National Academy of Sciences,2006,103(23):8577-8582.
[10]ZITNIK M,SOSIC R,LESKOVEC J.Prioritizing NetworkCommunities [J].Nature Communications,2018,9(1):1-9.
[11]SHANG H X,LIU Z P.Prioritizing Type 2 Diabetes Genes by Weighted PageRank on Bilayer Heterogeneous Networks[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2021,18(1):336-346.
[12]PONTES B,GIRALDEZ R,AGUILAR-RUIZ J S.Biclusteringon Expression Data:A Review[J].Journal of Biomedical Informatics,2015,57(6):163-180.
[13]CHENG L,YANG H,ZHAO H,et al.MetSigDis:A Mamually Curated Resource for The Metabolic Signatures of Diseases[J].Briefings BioInf.,2019,20:203-209.
[14]POTTINGER T D,PUCKELWARTZ M J,PESCE L L,et al.Pathogenic and Uncertain Genetic Variants Have Clinical Cardiac Correlates in Diverse Biobank Participants[J].J.Am.Heart.Assoc.,2020,9:26.
[15]ZOU Y,HUI R,SONG L.The Era of Clinical Application of Gene Diagnosis in Cardiovascular Diseases Is Coming[J].Chronic.Dis.Transl.Med.,2019,5:214-220.
[16]TIMILSINA M,YANG H,SAHAY R,et al.Predicting LinksBetween Tumor Samples and Using 2-Layered Graph Based Diffusion Approach[J].BMC Bioinf.,2019,20:1-20.
[17]XU B,LIU Y,YU S,et al.A Network Embedding Model for Pathogenic Genes Prediction by Multi-Path Random Walking on Heterogeneous Network[J].BMC Med Genomics,2019,12:188.
[18]ZHANG H P,WANG H N,LU G M,et al.Finding Differentially Co-Expressed Disease-Related Genes Based on Mutual Information[J].Journal of Southeast University(Natural Science Edition),2009,39:151-155.
[19]YU L,REN S J.Prediction of Cancerous Pathogenic GenesBased on Network and Gene Differential Expression Information[J].Scientia Sinica Vitae,2023,53(1):94-108.
[20]SHANNON C E.A Mathematical Theory of Communication[J].The Bell System Technical Journal,1948,27:379-423.
[21]WANG L,CHEN P,CHEN S,et al.A Novel Approach to Fully Representing The Diversity in Conditional Dependencies for Learning Bayesian Network Classifier[J].Intelligent Data Ana-lysis,2021,25(11):35-55.
[22]DUAN Z,WANG L,CHEN S,et al.Instance-Based Weighting Filter for Superparent One-Dependence Estimators[J].Know-ledge-Based Systems,2020,203(8):106-115.
[23]CABUZ S,ABREU G.Causal Inference for Multivariate Stochastic Process Prediction[J].Information Sciences,2018,448(12):134-148.
[24]SUN J,TAYLOR D,BOLLT E M.Causal Network Inference by Optimal Causation Entropy[J].SIAM Journal on Applied Dynamical Systems,2015,14(3):73-106.
[25]CHUA H N,SUNG W K,WONG L.Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein—Protein Interactions[J].Bioinformatics,2006,22(13):1623-1630.
[26]SHAIK J S,YEASIN M.A Unified Framework for Finding Differentially Expressed Genes from Microarray Experiments[J].BMC Bioinformatics,2007,8:347.
[27]LI X,RAO S,WANG Y,et al.Gene Mining:A Novel And Power-ful Ensemble Decision Approach to Hunting for Genes Using Microarray Expression Profiling[J].Nucleic Acids Research,2004,32(9):2685-2694.
[28]DIAO Q,HU W,ZHONG H,et al.Disease Gene Explorer:Display Disease Gene Dependency by Combining Bayesian Networks with Clustering[C]//Proceedings of The IEEE Computational Systems Bioinformatics Conference.Stanford,USA,2004:574-575.
[29]ZHANG X W,YAP Y L,WEI D,et al.Molecular Diagnosis of Human Cancer Type by Gene Expression Profiles and Indepen-dent Component Analysis[J].European Journal of Human Genetics,2005,13(12):1303-1311.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!