Computer Science ›› 2018, Vol. 45 ›› Issue (2): 69-75.doi: 10.11896/j.issn.1002-137X.2018.02.012

Previous Articles     Next Articles

Improved Ensemble Method on MicroRNA Prediction Model

DONG Hong-bin, SHI Li and LI Tao   

  • Online:2018-02-15 Published:2018-11-13

Abstract: The existing microRNA prediction methods often present the problems of imbalance data set class and single applicable species.In order to solve the above problems,the main work is as follows.Firstly,a hierarchical sampling algorithm based on sequence entropy was proposed,which can generate a training set enhancing balance positive and negative samples based on the overall distribution of the samples.Secondly,a feature selection algorithm based on signal-to-noise ratio and correlation was designed to reduce the scale of training set and achieve the purpose of improving training speed.Thirdly,the DS-GA was proposed to shorten the optimization time of SVM classifier parameters and avoid the over-fitting problem.At last,based on the idea of ensemble learning,a common microRNA prediction model was established by sampling,feature selection and classifier parameter optimization.Experiments show that the model solves the problem of imbalance effectively,it is not limited to a single species and achieves better results for the hybrid species test set prediction.

Key words: MicroRNA,Prediction,Sampling,Feature selection,Imbalance class

[1] ERSON-BENSAN A E.Introduction to microRNAs in biological systems[J].Methods in Molecular Biology,2014,1107(1107):1.
[2] SAAR M D,ALLMER J.Current limitations for computationalanalysis of microRNA in cancer[J].Pakistan Journal of Clinical and Biomedical Research,2013,1(2):3-5.
[3] 刘长征,余佳.microRNA鉴定与功能分析技术[M].北京:化学工业出版社.
[4] HATA A,KASHIMA R.Dysregulation of microRNA biogenesis machinery in cancer[J].Critical Reviews in Biochemistry and Molecular Biology,2016,51(3):1-14.
[5] REDDY K B.MicroRNA (miRNA) in cancer[J].Cancer Cell International,2015,15(1):1-6.
[6] JIANG P,WU H,LU Z H,et al.MiPred:classification of real and pseudo microRNA precursors using random forest prediction model with combined features[J].Nucleic Acids Research,2007,35(Web Server issue):339-444.
[7] BENTWICH I,AVNIEL A,KAROV Y,et al.Identification of hundreds of conserved and nonconserved human microRNAs[J].Nature Genetics,2005,7(7):766-870.
[8] NG K L,MISHRA S K.De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures[J].Bioinformatics,2007,23(11):1321-1330.
[9] HUANG Y,ZOU Q,SUN X H,et al.Computational identification of microRNAs and their targets in perennial Ryegrass (Lolium perenne)[J].Applied Biochemistry and Biotechnology,2014,173(4):1011-1122.
[10] HUANG Y,CHENG J H,LUO F N,et al.Genome-wide identification and characterization of microRNA genes and their targets in large yellow croaker ( Larimichthys crocea )[J].Gene,2016,576(1):261-267.
[11] ZHAO D,WANG Y,LUO D,et al.PMirP:a pre-microRNAprediction method based on structure-sequence hybrid features[J].Artificial Intelligence in Medicine,2010,49(2):127-132.
[12] HUANG K Y,LEE T Y,TENG Y C,et al.ViralmiR:a support-vector-machine-based method for predicting viral microRNA precursors[J].BMC Bioinformatics,2015,6(1):1-7.
[13] KHALIFA W,YOUSEF M,DEMIRCI M D S,et al.The impact of feature selection on one and two-class classification perfor-mance for plant microRNAs[J].Peerj,2016,4(2):e2135.
[14] YOUSEF M,ALLMER J,KHALIFA W.Sequence Motif-Based One-Class Classifiers Can Achieve Comparable Accuracy to Two-Class Learners for Plant microRNA Detection[J].Journal of Biomedical Science & Engineering,2015,8(10):684-694.
[15] YOUSEF M,ALLMER J,KHALIFA W.Feature Selection for MicroRNA Target Prediction-Comparison of One-Class Feature Selection Methodologies[C]∥International Conference on Bioinformatics Models,Methods and Algorithms.2016.
[16] YOUSEF M,ALLMER J,KHALIFA W.Accurate Plant Mi-croRNA Prediction Can Be Achieved Using Sequence Motif Features[J].Journal of Intelligent Learning Systems & Applications,2016,8(1):9-22.
[17] ZHONG L,WANG J T L,WEN D,et al.Effective Classification of MicroRNA Precursors Using Feature Mining and AdaBoost Algorithms[J].Omics A Journal of Integrative Biology,2013,17(9):486-493.
[18] CEVIK N,SAKAR C O,KURSUN O.Analysis of Relations Between Shared miRNAs of Different Species Using Canonical Correlation Analysis[C]∥International Conference on Applied Informatics and Health and Life Sciences.2013:1980-1989.
[19] CEVIK N,SAKAR C O,KURSUN O.Analysis of shared miRNAs of different species using ensemble CCA and genetic distance[J].Computers in Biology & Medicine,2015,64:261-267.
[20] XU J C,LI T,SUN L,et al.Feature selection method based on signal-to-noise ratio and neighborhood rough set [J].Journal of Data Acquisition and Processing,2015,30(5):973-981.(in Chinese) 徐久成,李涛,孙林,等.基于信噪比与邻域粗糙集的特征基选择方法[J].数据采集与处理,2015,30(5):973-981.

No related articles found!
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .