计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 264-270.doi: 10.11896/jsjkx.201100129

• 人工智能 • 上一篇    下一篇

基于信息传播的致病基因识别研究

李家文, 郭炳晖, 杨小博, 郑志明   

  1. 北京航空航天大学大数据与脑机智能高精尖中心 北京100191
    鹏程实验室 广东 深圳518055
    北京航空航天大学数学科学学院教育部数学信息与行为重点实验室 北京100191
  • 收稿日期:2020-11-17 修回日期:2021-04-18 出版日期:2022-01-15 发布日期:2022-01-18
  • 通讯作者: 郭炳晖(guobinghui@buaa.edu.cn)
  • 作者简介:jiawenli@buaa.edu.cn
  • 基金资助:
    科技创新2030-“新一代人工智能”重大项目(2018AAA0102301);国家自然科学基金项目(11671025);民机项目(MJ-F-2012-04)

Disease Genes Recognition Based on Information Propagation

LI Jia-wen, GUO Bing-hui, YANG Xiao-bo, ZHENG Zhi-ming   

  1. Beijing Advanced Innovation Center for Big Data and Brain Computing,Beihang University,Beijing 100191,China
    Peng Cheng Laboratory,Shenzhen,Guangdong 518055,China
    Key Laboratory of Mathematics,Informatics and Behavioral Semantics,School of Mathematical Sciences,Beihang University,Beijing 100191, China
  • Received:2020-11-17 Revised:2021-04-18 Online:2022-01-15 Published:2022-01-18
  • About author:LI Jia-wen,born in 1996,postgraduate,is a member of China Computer Federation.His main research interests include complex networks and bioinformatics.
    GUO Bing-hui,born in 1982,associate professor,is a professional member of China Computer Federation.His main research interests include data science and complex intelligent system.
  • Supported by:
    Artificial Intelligence Project(2018AAA0102301),National Natural Science Foundation of China(11671025) and Fundamental Research of Civil Aircraft(MJ-F-2012-04).

摘要: 基因在生命科学领域的研究中占据着重要地位,而致病基因则是关键重心之一。对致病基因的精准识别可以揭示疾病在分子层面的发病机制,为疾病的预防、诊断及治疗等多个阶段提供强力支撑。准确识别致病基因的关键在于给出基因之间的相似性度量。文中利用复杂网络对生物系统进行建模,并提出了一种带有耗散机制的多源头重启随机游走模型DRWMR来度量基因之间的功能相似程度。首先基于NCBI等生物数据库构建人类基因相互作用网络,并在KEGG的疾病-基因关联数据集上开展实验对已知致病基因进行识别。与SP,RWR和PRINCE 3种现有模型进行对比,DRWMR准确预测了581种疾病中的156种,而其余模型平均正确预测了121.3种,DRWMR的平均预测分数相比其余模型的预测分数均值高出9.46%。最后使用所提模型预测哮喘、血友病和PEHO综合征的潜在致病基因,预测结果均在文献或数据库中找到了理论或实验支持。

关键词: 复杂网络, 基因功能预测, 生物信息学, 信息传播

Abstract: Genetic research in the field of life science and medicine occupies an important position,while disease genes are one of its key focuses.Accurate identification of disease-causing genes can reveal the pathogenesis of diseases at the molecular level,and provide strong support for the prevention,diagnosis,treatment and other medical stages of diseases.The key to accurately identifying disease-causing genes is to give a measure of similarity between genes.This paper uses complex networks to model biological systems and proposes a dissipative random walk model with multiple restarts to measure the degree of functional similarity between genes.Firstly,a human gene-gene interaction network is constructed based on the human gene interaction datasets on NCBI.Experiments are then carried out on KEGG's disease-gene association dataset to identify known disease-causing genes.Compared with the three existing models of SP,RWR and PRINCE,DRWMR accurately predicts 156 of 581 diseases while the remaining models predict 121.3 correctly on average.The average prediction score of DRWMR is 9.46% higher.Finally,the potential disease genes of asthma,hemophilia and PEHO syndrome are predicted and the candidate genes are found guilty for the pathologies in the literature or biological database.

Key words: Bioinformatics, Complex networks, Gene function prediction, Information propagation

中图分类号: 

  • R319
[1]MASYS D R.New directions in bioinformatics[J].Journal of research of the National Institute of Standards and Technology,1989,94(1):59.
[2]BARABASI A L,OLTVAI Z N.Network biology:understan-ding the cell's functional organization[J].Nature Reviews Gene-tics,2004,5(2):101.
[3]KERMACK W O,MCKENDRICK A G.A contribution to the mathematical theory of epidemics[J].Proceedings of the Royal Society of London.Series A,Containing Papers of a Mathematical and Physical Character,1927,115(772):700-721.
[4]COWEN L,IDEKER T,RAPHAEL B J,et al.Network propagation:a universal amplifier of genetic associations[J].Nature Reviews Genetics,2017,18(9):551.
[5]WESTON J,KUANG R,LESLIE C,et al.Protein ranking bysemi-supervised network propagation[J].BMC Bioinformatics,2006,7(1):S10.
[6]WESTON J,ELISSEEFF A,ZHOU D,et al.Protein ranking:from local to global structure in the protein similarity network[J].Proceedings of the National Academy of Sciences,2004,101(17):6559-6563.
[7]QI Y,SUHAIL Y,LIN Y,et al.Finding friends and enemies in an enemies-only network:a graph diffusion kernel for predicting novel genetic interactions and co-complex membership from yeast genetic interactions[J].Genome Research,2008,18(12):1991-2004.
[8]VANDIN F,UPFAL E,RAPHAEL B J.Algorithms for detecting significantly mutated pathways in cancer[J].Journal of Computational Biology,2011,18(3):507-522.
[9]VANUNU O,MAGGER O,RUPPIN E,et al.Associating genes and protein complexes with disease via network propagation[J].PLoS Computational Biology,2010,6(1):e1000641.
[10]QIAN Y,BESENBACHER S,MAILUND T,et al.Identifying disease associated genes by network propagation[C]//BMC Systems Biology.BioMed Central,2014,8(1):S6.
[11]MACROPOL K,CAN T,SINGH A K.RRW:repeated random walks on genome-scale protein networks for local clusterdisco-very[J].BMC Bioinformatics,2009,10(1):283.
[12]WANG X P.Research on disease-causing gene prediction algorithm based on heterogeneous information fusion[D].Harbin Institute of Technology,2019.
[13]ZHAO N,LI J,WANG J,et al.Relatively important node mi-ning method based on adjacent layer propagation[J].Journal of University of Electronic Science and Technology of China,2021,50(1):121-126.
[14]PEARSON K.The problem of the random walk[J].Nature,1904,72(1867):342.
[15]ALANIS-LOBATO G,ANDRADE-NAVARRO M A,SCHAEFER M H.HIPPIE v2.0:enhancing meaningfulness and reliability of protein-protein interaction networks[J].Nucleic Acids Research,2016,45(1):D408-D414.
[16]KÖHLER S,BAUER S,HORN D,et al.Walking the interactome for prioritization of candidate disease genes[J].American Journal of Human Genetics,2008,82(4):949-958.
[17]ELKAIM E,NEVEN B,BRUNEAU J,et al.Clinical and immunologic phenotype associated with activated phosphoinositide 3-kinase δ syndrome 2:a cohort study[J].Journal of Allergy and Clinical Immunology,2016,138(1):210-218.e9.
[18]A2M alpha-2-macroglobulin “Summary” [EB/OL].https://www.ncbi.nlm.nih.gov/gene/2.
[19]SAMELSON-JONES B J,ARRUDA V R.Protein-engineeredcoagulation factors for hemophilia gene therapy[J].Molecular Therapy-Methods & Clinical Development,2019,12:184-201.
[20]SABAIE H,AHANGAR N K,GHAFOURI-FARD S,et al.Clinical and genetic features of PEHO and PEHO-Like syndromes:A scoping review[J].Biomedicine & Pharmacotherapy,2020,131:110793.
[1] 郑文萍, 刘美麟, 杨贵.
一种基于节点稳定性和邻域相似性的社区发现算法
Community Detection Algorithm Based on Node Stability and Neighbor Similarity
计算机科学, 2022, 49(9): 83-91. https://doi.org/10.11896/jsjkx.220400146
[2] 何茜, 贺可太, 王金山, 林绅文, 杨菁林, 冯玉超.
比特币实体交易模式分析
Analysis of Bitcoin Entity Transaction Patterns
计算机科学, 2022, 49(6A): 502-507. https://doi.org/10.11896/jsjkx.210600178
[3] 杨波, 李远彪.
数据科学与大数据技术课程体系的复杂网络分析
Complex Network Analysis on Curriculum System of Data Science and Big Data Technology
计算机科学, 2022, 49(6A): 680-685. https://doi.org/10.11896/jsjkx.210800123
[4] 王本钰, 顾益军, 彭舒凡, 郑棣文.
融合动态距离和随机竞争学习的社区发现算法
Community Detection Algorithm Based on Dynamic Distance and Stochastic Competitive Learning
计算机科学, 2022, 49(5): 170-178. https://doi.org/10.11896/jsjkx.210300206
[5] 畅雅雯, 杨波, 高玥琳, 黄靖云.
基于SEIR的微信公众号信息传播建模与分析
Modeling and Analysis of WeChat Official Account Information Dissemination Based on SEIR
计算机科学, 2022, 49(4): 56-66. https://doi.org/10.11896/jsjkx.210900169
[6] 陈世聪, 袁得嵛, 黄淑华, 杨明.
基于结构深度网络嵌入模型的节点标签分类算法
Node Label Classification Algorithm Based on Structural Depth Network Embedding Model
计算机科学, 2022, 49(3): 105-112. https://doi.org/10.11896/jsjkx.201000177
[7] 赵学磊, 季新生, 刘树新, 李英乐, 李海涛.
基于路径连接强度的有向网络链路预测方法
Link Prediction Method for Directed Networks Based on Path Connection Strength
计算机科学, 2022, 49(2): 216-222. https://doi.org/10.11896/jsjkx.210100107
[8] 谢良旭, 李峰, 谢建平, 许晓军.
基于融合神经网络模型的药物分子性质预测
Predicting Drug Molecular Properties Based on Ensembling Neural Networks Models
计算机科学, 2021, 48(9): 251-256. https://doi.org/10.11896/jsjkx.200700066
[9] 桑春艳, 胥文, 贾朝龙, 文俊浩.
社交网络中基于注意力机制的网络舆情事件演化趋势预测
Prediction of Evolution Trend of Online Public Opinion Events Based on Attention Mechanism in Social Networks
计算机科学, 2021, 48(7): 118-123. https://doi.org/10.11896/jsjkx.200600155
[10] 穆俊芳, 郑文萍, 王杰, 梁吉业.
基于重连机制的复杂网络鲁棒性分析
Robustness Analysis of Complex Network Based on Rewiring Mechanism
计算机科学, 2021, 48(7): 130-136. https://doi.org/10.11896/jsjkx.201000108
[11] 胡军, 王雨桐, 何欣蔚, 武晖栋, 李慧嘉.
基于复杂网络的全球航空网络结构分析与应用
Analysis and Application of Global Aviation Network Structure Based on Complex Network
计算机科学, 2021, 48(6A): 321-325. https://doi.org/10.11896/jsjkx.200900112
[12] 王学光, 张爱新, 窦炳琳.
复杂网络上的非线性负载容量模型
Non-linear Load Capacity Model of Complex Networks
计算机科学, 2021, 48(6): 282-287. https://doi.org/10.11896/jsjkx.200700040
[13] 马媛媛, 韩华, 瞿倩倩.
基于节点亲密度的重要性评估算法
Importance Evaluation Algorithm Based on Node Intimate Degree
计算机科学, 2021, 48(5): 140-146. https://doi.org/10.11896/jsjkx.200300184
[14] 殷子樵, 郭炳晖, 马双鸽, 米志龙, 孙怡帆, 郑志明.
群智体系网络结构的自治调节:从生物调控网络结构谈起
Autonomous Structural Adjustment of Crowd Intelligence Network: Begin from Structure of Biological Regulatory Network
计算机科学, 2021, 48(5): 184-189. https://doi.org/10.11896/jsjkx.210200161
[15] 袁得嵛, 陈世聪, 高见, 王小娟.
基于斯塔克尔伯格博弈的在线社交网络扭曲信息干预算法
Intervention Algorithm for Distorted Information in Online Social Networks Based on Stackelberg Game
计算机科学, 2021, 48(3): 313-319. https://doi.org/10.11896/jsjkx.200400079
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!