计算机科学 ›› 2018, Vol. 45 ›› Issue (2): 69-75.doi: 10.11896/j.issn.1002-137X.2018.02.012

• 2017年中国计算机学会人工智能会议 • 上一篇    下一篇

一种改进的microRNA预测模型集成方法

董红斌,石丽,李涛   

  1. 哈尔滨工程大学计算机科学与技术学院 哈尔滨150001,哈尔滨工程大学计算机科学与技术学院 哈尔滨150001,哈尔滨工程大学计算机科学与技术学院 哈尔滨150001
  • 出版日期:2018-02-15 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金项目(61472095)资助

Improved Ensemble Method on MicroRNA Prediction Model

DONG Hong-bin, SHI Li and LI Tao   

  • Online:2018-02-15 Published:2018-11-13

摘要: 现有的microRNA预测方法往往存在数据集类不平衡和适用物种单一的问题。针对以上问题,所做主要工作如下:1)提出基于序列熵的分层采样算法,该算法可在保持样本总体分布的基础上,采样生成正样本和负样本数量平衡的训练集;2)提出基于信噪比和相关性的特征选择,用于缩小训练集规模,以达到提高训练速度的目的;3)提出DS-GA算法,用于缩短SVM分类器参数的优化时间,达到减少过拟合的目的;4)结合集成学习的思想,经采样、特征选择、分类器参数优化3个步骤,建立了一种物种间通用的microRNA预测模型。实验表明,该模型有效解决了类不平衡问题,且不局限于单一物种,对混合物种的测试集预测取得了较好效果。

关键词: microRNA,预测,采样,特征选择,类不平衡

Abstract: The existing microRNA prediction methods often present the problems of imbalance data set class and single applicable species.In order to solve the above problems,the main work is as follows.Firstly,a hierarchical sampling algorithm based on sequence entropy was proposed,which can generate a training set enhancing balance positive and negative samples based on the overall distribution of the samples.Secondly,a feature selection algorithm based on signal-to-noise ratio and correlation was designed to reduce the scale of training set and achieve the purpose of improving training speed.Thirdly,the DS-GA was proposed to shorten the optimization time of SVM classifier parameters and avoid the over-fitting problem.At last,based on the idea of ensemble learning,a common microRNA prediction model was established by sampling,feature selection and classifier parameter optimization.Experiments show that the model solves the problem of imbalance effectively,it is not limited to a single species and achieves better results for the hybrid species test set prediction.

Key words: MicroRNA,Prediction,Sampling,Feature selection,Imbalance class

[1] ERSON-BENSAN A E.Introduction to microRNAs in biological systems[J].Methods in Molecular Biology,2014,1107(1107):1.
[2] SAAR M D,ALLMER J.Current limitations for computationalanalysis of microRNA in cancer[J].Pakistan Journal of Clinical and Biomedical Research,2013,1(2):3-5.
[3] 刘长征,余佳.microRNA鉴定与功能分析技术[M].北京:化学工业出版社.
[4] HATA A,KASHIMA R.Dysregulation of microRNA biogenesis machinery in cancer[J].Critical Reviews in Biochemistry and Molecular Biology,2016,51(3):1-14.
[5] REDDY K B.MicroRNA (miRNA) in cancer[J].Cancer Cell International,2015,15(1):1-6.
[6] JIANG P,WU H,LU Z H,et al.MiPred:classification of real and pseudo microRNA precursors using random forest prediction model with combined features[J].Nucleic Acids Research,2007,35(Web Server issue):339-444.
[7] BENTWICH I,AVNIEL A,KAROV Y,et al.Identification of hundreds of conserved and nonconserved human microRNAs[J].Nature Genetics,2005,7(7):766-870.
[8] NG K L,MISHRA S K.De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures[J].Bioinformatics,2007,23(11):1321-1330.
[9] HUANG Y,ZOU Q,SUN X H,et al.Computational identification of microRNAs and their targets in perennial Ryegrass (Lolium perenne)[J].Applied Biochemistry and Biotechnology,2014,173(4):1011-1122.
[10] HUANG Y,CHENG J H,LUO F N,et al.Genome-wide identification and characterization of microRNA genes and their targets in large yellow croaker ( Larimichthys crocea )[J].Gene,2016,576(1):261-267.
[11] ZHAO D,WANG Y,LUO D,et al.PMirP:a pre-microRNAprediction method based on structure-sequence hybrid features[J].Artificial Intelligence in Medicine,2010,49(2):127-132.
[12] HUANG K Y,LEE T Y,TENG Y C,et al.ViralmiR:a support-vector-machine-based method for predicting viral microRNA precursors[J].BMC Bioinformatics,2015,6(1):1-7.
[13] KHALIFA W,YOUSEF M,DEMIRCI M D S,et al.The impact of feature selection on one and two-class classification perfor-mance for plant microRNAs[J].Peerj,2016,4(2):e2135.
[14] YOUSEF M,ALLMER J,KHALIFA W.Sequence Motif-Based One-Class Classifiers Can Achieve Comparable Accuracy to Two-Class Learners for Plant microRNA Detection[J].Journal of Biomedical Science & Engineering,2015,8(10):684-694.
[15] YOUSEF M,ALLMER J,KHALIFA W.Feature Selection for MicroRNA Target Prediction-Comparison of One-Class Feature Selection Methodologies[C]∥International Conference on Bioinformatics Models,Methods and Algorithms.2016.
[16] YOUSEF M,ALLMER J,KHALIFA W.Accurate Plant Mi-croRNA Prediction Can Be Achieved Using Sequence Motif Features[J].Journal of Intelligent Learning Systems & Applications,2016,8(1):9-22.
[17] ZHONG L,WANG J T L,WEN D,et al.Effective Classification of MicroRNA Precursors Using Feature Mining and AdaBoost Algorithms[J].Omics A Journal of Integrative Biology,2013,17(9):486-493.
[18] CEVIK N,SAKAR C O,KURSUN O.Analysis of Relations Between Shared miRNAs of Different Species Using Canonical Correlation Analysis[C]∥International Conference on Applied Informatics and Health and Life Sciences.2013:1980-1989.
[19] CEVIK N,SAKAR C O,KURSUN O.Analysis of shared miRNAs of different species using ensemble CCA and genetic distance[J].Computers in Biology & Medicine,2015,64:261-267.
[20] XU J C,LI T,SUN L,et al.Feature selection method based on signal-to-noise ratio and neighborhood rough set [J].Journal of Data Acquisition and Processing,2015,30(5):973-981.(in Chinese) 徐久成,李涛,孙林,等.基于信噪比与邻域粗糙集的特征基选择方法[J].数据采集与处理,2015,30(5):973-981.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .