计算机科学 ›› 2022, Vol. 49 ›› Issue (2): 265-271.doi: 10.11896/jsjkx.201100132

• 人工智能 • 上一篇    下一篇

基于集成回归决策树的lncRNA-疾病关联预测方法

任首朋1, 李劲1, 王静茹1, 岳昆2   

  1. 1 云南大学软件学院 昆明650091
    2 云南大学信息学院 昆明650091
  • 收稿日期:2020-11-18 修回日期:2021-05-30 出版日期:2022-02-15 发布日期:2022-02-23
  • 通讯作者: 李劲(lijin@ynu.edu.cn)
  • 作者简介:supi2012212@qq.com
  • 基金资助:
    国家自然科学基金云南联合基金项目(U1802271);云南省基础研究杰出青年项目(2019FJ011);云南省应用基础研究计划重点项目(201901BB050052)

Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction

REN Shou-peng1, LI Jin1, WANG Jing-ru1, YUE Kun2   

  1. 1 School of Software,Yunnan University,Kunming 650091,China
    2 School of Information Science & Engineering,Yunnan University,Kunming 650091,China
  • Received:2020-11-18 Revised:2021-05-30 Online:2022-02-15 Published:2022-02-23
  • About author:REN Shou-peng,born in 1997,master.His main research interests include bioinformatics and machine learning.
    LI Jin,born in 1975,Ph.D,professor.His main research interests include machine learning and bioinformatics.
  • Supported by:
    Foundation of National Natural Science Foundation of China United Yunnan Province(U1802271),Foundation of Outstanding Youth Project of Basic Research in Yunnan Province(2019FJ011) and Foundation of Key Project of Basic Research in Yunnan Province(201901BB050052).

摘要: 长链非编码RNA(long non-coding RNA,lncRNA)在各种人类复杂疾病中起着重要作用。采用计算方法推断lncRNA-疾病间的潜在关联关系不仅有助于理解疾病的致病机理,还有助于疾病诊断、预防和治疗。文中提出了一种基于集成回归决策树的lncRNA-疾病关联预测方法。首先,利用已知的lncRNA-疾病关联信息分别构建lncRNA、疾病相似矩阵、lncRNA-疾病关联矩阵;其次,基于lncRNA、疾病相似矩阵、lncRNA-疾病关联矩阵,从不同视角进一步构建lncRNA、疾病特征向量;然后,使用主成分分析方法对lncRNA、疾病特征进行特征提取;最后,使用回归决策树作为预测模型,并进一步采用集成学习的平均策略将多个决策树集成,从而获得最终的预测模型。留一交叉验证实验表明,该方法的预测结果优于现有方法,在3个真实的lncRNA-疾病数据集上AUC值分别达到了0.905 5,0.896 9和0.912 9,与现有方法相比,分别提升了6.46%,5.4%和6.02%。此外,对乳腺癌、肺癌、胃癌3种疾病进行了案例分析,进一步验证了所提方法的准确性和有效性。

关键词: CART决策树, lncRNA-疾病, 关联预测, 集成学习, 特征提取

Abstract: Long non-coding RNA (lncRNA) plays an important role in various complex human diseases.The development of effective prediction methods to infer the potential associations between lncRNA and diseases will not only help biologists understand the pathogenesis of diseases,but also contribute to the diagnosis,prevention,and treatment of human diseases.In this paper,an ensemble regression decision tree-based lncRNA-disease association method (ERDTLDA) is proposed to solve the lncRNA-disease association problem.First,ERDTLDA uses the open-source data of lncRNA to construct lncRNA,disease similarity matrix,lncRNA-disease association matrix respectively.Then,we obtain lncRNA,disease feature representations from these matrices.Principal component analysis is further exploited for feature extraction.Finally,a CART regression decision tree is used to yield association scores.An ensemble strategy for multiple decision trees is proposed to further improve the accuracy of our model.The results of LOOCV experiments show that the AUC of our method on three real lncRNA-disease datasets are 0.905 5,0.896 9 and 0.912 9 respectively,which are 6.46%,5.4% and 6.02% higher than the existing methods,respectively.Additionally,breast cancer,lung cancer,and gastric cancer are also used as case studies to further verify the accuracy and effectiveness of ERDTLDA.

Key words: Association prediction, CART decision tree, Ensemble learning, Feature extraction, lncRNA-disease

中图分类号: 

  • TP391
[1]HUTTENHOFER A,SCHATTNER P,POLACEK N.Non-coding RNAs:hope or hype?[J].Trends in Genetics,2005,21(5):289-297.
[2]GEISLER S,COLLER J.RNA in unexpected places:Long non- coding RNA functions in diverse cellular contexts[J].Nature Reviews Molecular Cell Biology,2013,14(11):699-712.
[3]CHEN X,YAN C C,ZHANG X,et al.Long non-coding RNAsand complex diseases:From experimental results to computational models[J].Briefings in Bioinformatics,2016,18(4):558-576.
[4]SUN J,SHI H B,WANG Z Z,et al.Infering novel lncRNA-di-sease asociations based on a random walk model of a lncRNA functional similarity network[J].Molecular BioSystems,2014,10(8):2074-2081.
[5]GU C,LI X Y,CAI L J,et al.Global network random walk for predicting potential human lncRNA-disease associations[J].Sci. Rep.,2017,7(1):12442-12453.
[6]WEN Y,HAN G,ANH V.Laplacian normalization and bi-random walks on heterogeneous networks for predicting lncRNA-disease associations[J].BMC Systems Biology,2018,12(9),122-131.
[7]CHEN X,YAN G Y.Novel human lncRNA-disease associationinference based on lncRNA expression profiles[J].Bioinforma-tics,2013,29(20):2617-2624.
[8]CHEN X,YANG C G,LUO C,et al.Constructing lncRNAfunctional similarity network based on lncRNA-disease associations and disease semantic similarity[J].Scientific Reports,2015,5:11338-11350.
[9]ZHAO T T,XU J Y,LIU L,et al.Identification of cancer-rela-ted lncRNAs through integrating genome,regulome and transcriptome features[J].Molecular BioSystems,2014,11(1):126-136.
[10]XUAN P,PAN S,ZHANG T,et al.Graph Convolutional Network and Convolutional Neural Network Based Method for Predicting lncRNA-Disease Associations[J].Cells,2019,8(9),1012:1-16.
[11]WANG M N,YOU Z H,WANG L.LDGRNMF:LncRNA-di-sease associations prediction based on graph regularized non-ne-gative matrix factorization[J].Neurocomputing,2021,424:236-245.
[12]LIU J X,CUI Z,GAO Y L,et al.WGRCMF:A Weighted Graph Regularized Collaborative Matrix Factorization Method for Predicting Novel LncRNA-Disease Associations[J].IEEE Journal of Biomedical and Health Informatics,2021,25(1):257-265.
[13]WEI H,LIAO Q,LIU B.iLncRNAdis-FB:identify lncRNA-di-sease associations by fusing biological feature blocks through deep neural network[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics/IEEE,ACM,2020(99):1-13.
[14]MA Y,GUO X L,SUN Y T,et al.Prediction of Disease Asso-ciated Long Non-Coding RNA Based on HeteSim[J].Journal of Computer Research and Development,2019,56(9):1889-1896.
[15]CHEN G,WANG Z Y,WANG D Q,et al.LncRNADisease:A database for long-non-coding RNA-asociated diseases[J].Nuc-leic Acids Research,2012,41(D1):D983-D986.
[16]NING S,ZHANG J,PENG W,et al.Lnc2Cancer:A manualycurated database of experimentaly supported lncRNAs asociated with various human cancers[J].Nucleic Acids Research,2015,44(D1):D980-D985.
[17]PENG H,LAN C W,LIU Y S,et al.Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes[J].Oncotarget,2017,8(45):78901-78916.
[18]BREIMAN L,FRIEDMAN J,STONE C J,et al.Classification and regression trees[M].CRC Press,1984:1-18.
[19]FRIENDENSON B.The BRCA1/2 pathway prevents hemato-logic cancers in addition to breast and ovarian cancers[J].BMC Cancer,2007,7(1):152-162.
[1] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[2] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[3] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[4] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[5] 林夕, 陈孜卓, 王中卿.
基于不平衡数据与集成学习的属性级情感分类
Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning
计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[6] 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩.
融合Bert和图卷积的深度集成学习软件需求分类
Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution
计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[7] 王宇飞, 陈文.
基于DECORATE集成学习与置信度评估的Tri-training算法
Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment
计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[8] 高元浩, 罗晓清, 张战成.
基于特征分离的红外与可见光图像融合算法
Infrared and Visible Image Fusion Based on Feature Separation
计算机科学, 2022, 49(5): 58-63. https://doi.org/10.11896/jsjkx.210200148
[9] 韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳.
基于共同子空间分类学习的跨媒体检索研究
Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning
计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157
[10] 左杰格, 柳晓鸣, 蔡兵.
基于图像分块与特征融合的户外图像天气识别
Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion
计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263
[11] 陈伟, 李杭, 李维华.
核小体定位预测的集成学习方法
Ensemble Learning Method for Nucleosome Localization Prediction
计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195
[12] 刘振宇, 宋晓莹.
一种可用于分类型属性数据的多变量回归森林
Multivariate Regression Forest for Categorical Attribute Data
计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189
[13] 张师鹏, 李永忠.
基于降噪自编码器和三支决策的入侵检测方法
Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions
计算机科学, 2021, 48(9): 345-351. https://doi.org/10.11896/jsjkx.200500059
[14] 周新民, 胡宜桂, 刘文洁, 孙荣俊.
基于多模态多层级数据融合方法的城市功能识别研究
Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method
计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220
[15] 冯霞, 胡志毅, 刘才华.
跨模态检索研究进展综述
Survey of Research Progress on Cross-modal Retrieval
计算机科学, 2021, 48(8): 13-23. https://doi.org/10.11896/jsjkx.200800165
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!