计算机科学 ›› 2019, Vol. 46 ›› Issue (7): 300-307.doi: 10.11896/j.issn.1002-137X.2019.07.046
张学扶,曾攀,金敏
ZHANG Xue-fu,ZENG Pan,JIN Min
摘要: 基于经验型组织病理学的癌症诊断往往误诊率很高。从基因层次对癌症进行分析和研究是现阶段提高癌症分类预测精度的重要途径之一。生物学研究表明,同种癌症的关联基因有着共同的功能特点。基于此,文中提出相关性和相似度联合的癌症分类预测集成方法。首先,一方面,从统计学角度分析基因的差异化表达,利用互信息方法对基因表达谱数据进行相关性计算;另一方面,从生物机理上进行基因间的相似性分析,结合拓扑相似性和语义相似性分别对蛋白质互作网络和GO数据进行基因间的功能相似度计算。以上两者结合,即通过同时最大化目标集合的相关性和相似度筛选出特征基因集。然后,通过Bootstrap方法对数据集进行多样性采样,在前面所选特征基因集的基础上利用多种机器学习算法训练得到多个差异化较大的分类预测模型。最后,利用得到的多模型对测试样本进行分类预测,通过决策模型得到最终的分类结果。对GEO中4种不同癌症数据集进行分类预测研究,并将所提方法与最近的研究方法进行综合对比,结果所提方法在各数据集上的分类预测精度均提高5%左右,相比IG/SGA方法最高能达到10%的精度提升。实验结果表明,相关性和相似度联合的方法有效提高了癌症的分类预测精度,选择得到的特征基因有利于揭示生物学意义,且将多种算法优势互补,可解决单个分类算法适用范围受限的问题。
中图分类号:
[1]SONG N F.Design and Analysis of Ensemble Classifier for Gene Expression Data of Cancer[J].Wireless Internet Technology,2016(7):71-72.(in Chinese)<br /> 宋年丰.癌症基因表达数据的集成分类器设计与分析[J].无线互联科技,2016(7):71-72.<br /> [2]CHEN J,ZHANG M,SHAO X G.Gene selection and cancer classification based on Monte Carlo and non-negative matrix factorization:CN 104462817 B[P].2017.(in Chinese)<br /> 陈晶,张苗,邵学广.基于蒙特卡洛和非负矩阵因子分解的基因选择和癌症分类方法:CN 104462817 B[P].2017.<br /> [3]NGUYEN T,KHOSRAVI A,CREIGHTON D,et al.Hidden Markov models for cancer classification using gene profiles[J].Information Sciences,2015,316(C):293-307.<br /> [4]LI Y,LI J.Disease gene identification by random walk on multigraphs mergingheterogeneous genomic and phenotype data[J].Bmc Genomics,2012,13(7):1-12.<br /> [5]LIU B,JIN M,PAN Z.Prioritization of candidate disease genes by combining topological similarity and semantic similarity[J].Journal of Biomedical Informatics,2015,57(C):1-5.<br /> [6]LIU G,WONG L,CHUA H N.Complex discovery from weighted PPI networks[J].Bioinformatics,2009,25(15):1891.<br /> [7]WANG H,JING X,NIU B.A discrete bacterial algorithm for feature selection in classification of microarray gene cancer data[J].Knowledge-Based Systems,2017,126(C):8-19.<br /> [8]GEORGE V S,RAJ C.Review On Feature Selection Techniques And The Impact Of Svm For Cancer Classification Using Gene Expression Profile[J].International Journal of Computer Scien-ce & Engineering Survey,2011,2(3):16-27.<br /> [9]BOUAZZA S H,HAMDI N,ZEROUAL A,et al.Gene--based cancer classification through feature selection with KNN and SVM classifiers[C]∥Intelligent Systems and Computer Vision.IEEE,2015:1-6.<br /> [10]NIKUMBH S,GHOSH S,JAYARAMAN V K.Biogeography-based informative gene selection and cancer classification using SVM and Random Forests[C]∥Evolutionary Computation.IEEE,2012:1-6.<br /> [11]LI J,ZHAO Z,LIU Y,et al.A Comparative Study on Machine Classification Model in Lung Cancer Cases Analysis[C]∥International Conference on Frontier Computing.Singapore:Sprin-ger,2016:343-357.<br /> [12]NAGARAJAN R,UPRETI M.An ensemble predictive mode- ling framework for breast cancer classification[J].Methods,2017,131.<br /> [13]ZHOU M,JIN M.Holographic Ensemble Forecasting Method for Short-Term Power Load[J].IEEE Transactions on Smart Grid,2017,PP(99):1-1.<br /> [14]GOH K I,CUSICK M E,VALLE D,et al.The human disease network[J].Proceedings of the National Academy of Sciences of the United States of America,2007,104(21):8685-8690.<br /> [15]ALZUBAIDI A,COSMA G,BROWN D,et al.Breast Cancer Diag- nosis Using a Hybrid Genetic Algorithm for Feature Selection Based on Mutual Information[C]∥International Conference on Interactive Technologies and Games.IEEE,2016.<br /> [16]REAL R,VARGAS J M.The Probabilistic Basis of Jaccard’s Index of Similarity[J].Systematic Biology,1996,45(3):380-385.<br /> [17]KOMM D,KR LOVICˇ R,M MKE T.On the Advice Complexity of the Set Cover Problem[C]∥International Computer Science Symposium in Russia.Berlin:Springer,2012:241-252.<br /> [18]WANG X,GULBAHCE N,YU H.Network-based methods for human disease gene prediction[J].Briefings in Functional Genomics,2011,10(5):280-293.<br /> [19]WU X,PANG E,LIN K,et al.Improving the Measurement of Semantic Similarity between Gene Ontology Terms and Gene Products:Insights from an Edge- and IC-Based Hybrid Method[J].Plos One,2013,8(5):e66745.<br /> [20]SZKLARCZYK D,FRANCESCHINI A,WYDER S,et al. STRING v10:protein-protein interaction networks,integrated over the tree of life[J].Nucleic Acids Research,2015,43:D447.<br /> [21]VANITHA C D A,DEVARAJ D,VENKATESULU M.Multiclass cancer diagnosis in microarray gene profile using mutual information and Support Vector Machine[J].Intelligent Data Analysis,2016,20(6):1425-1439.<br /> [22]DING C,PENG H.Minimum Redundancy Feature Selection from Microarray Gene Expression Data[J].Journal of Bioinformatics & Computational Biology,2005,3(2):185-205.<br /> [23]JOHNSON R W.An introduction to the bootstrap[J].Teaching Statistics,2001,23(2):49-54.<br /> [24]BARRETT T,SUZEK T O,TROUP D B,et al.NCBI GEO:mining millions of profiles—database and tools[J].Nucleic Acids Research,2005,33(Database Issue):D562.<br /> [25]TIMALSINA P,CHARLES K,MONDAL A M.STRING PPI Score to Characterize Protein Subnetwork Biomarkers for Human Diseases and Pathways[C]∥IEEE International Confe-rence on Bioinformatics and Bioengineering.IEEE,2014:251-256.<br /> [26]SALEM H,ATTIYA G,EL-FISHAWY N.Classification of human cancer diseases by gene profiles[J].Applied Soft Computing,2017,50:124-134.<br /> [27]CHEN K H,WANG K J,WANG K M,et al.Applying particle swarm optimization-based decision tree classifier forcancer classification on gene data[J].Applied Soft Computing,2014,24(C):773-780. |
[1] | 陈莹, 郝应光, 王洪玉, 王坤. 基于局部梯度强度图的动态规划检测前跟踪算法 Dynamic Programming Track-Before-Detect Algorithm Based on Local Gradient and Intensity Map 计算机科学, 2022, 49(8): 150-156. https://doi.org/10.11896/jsjkx.210700135 |
[2] | 杨啸, 王翔坤, 胡浩, 朱敏. 面向设备状态监测的可视化技术综述 Survey on Visualization Technology for Equipment Condition Monitoring 计算机科学, 2022, 49(7): 89-99. https://doi.org/10.11896/jsjkx.210900167 |
[3] | 赵耿, 王超, 马英杰. 基于混沌序列相关性的峰均比抑制研究 Study on PAPR Reduction Based on Correlation of Chaotic Sequences 计算机科学, 2022, 49(5): 250-255. https://doi.org/10.11896/jsjkx.210400292 |
[4] | 刘意, 毛莺池, 程杨堃, 高建, 王龙宝. 基于邻域一致性的异常检测序列集成方法 Locality and Consistency Based Sequential Ensemble Method for Outlier Detection 计算机科学, 2022, 49(1): 146-152. https://doi.org/10.11896/jsjkx.201000156 |
[5] | 罗月童, 汪涛, 杨梦男, 张延孔. 基于历史行车轨迹集的车辆行为可视分析方法 Historical Driving Track Set Based Visual Vehicle Behavior Analytic Method 计算机科学, 2021, 48(9): 86-94. https://doi.org/10.11896/jsjkx.200900040 |
[6] | 冯霞, 胡志毅, 刘才华. 跨模态检索研究进展综述 Survey of Research Progress on Cross-modal Retrieval 计算机科学, 2021, 48(8): 13-23. https://doi.org/10.11896/jsjkx.200800165 |
[7] | 陈钱, 周杰, 邵根富. 角度域任意功率谱MIMO信道特征计算 MIMO Channels with Arbitrary AoA Power Spectrum for Various Wireless Environments 计算机科学, 2020, 47(6): 271-275. https://doi.org/10.11896/jsjkx.190500022 |
[8] | 莫彩网, 常侃, 李恒鑫, 李明鸿, 覃团发. 基于通道间相关性和非局部自相似性的彩色图像超分辨率算法 Color Image Super-resolution Algorithm Based on Inter-channel Correlation and Nonlocal Self-similarity 计算机科学, 2020, 47(6): 138-143. https://doi.org/10.11896/jsjkx.190500047 |
[9] | 周先春, 徐燕. 基于结构相关性的自适应图像修复 Adaptive Image Inpainting Based on Structural Correlation 计算机科学, 2020, 47(4): 131-135. https://doi.org/10.11896/jsjkx.190300149 |
[10] | 刘晓玲,刘柏嵩,王洋洋,唐浩. 基于深度学习的多标签生成研究进展 Research and Development of Multi-label Generation Based on Deep Learning 计算机科学, 2020, 47(3): 192-199. https://doi.org/10.11896/jsjkx.190300137 |
[11] | 王瑞杰, 李军怀, 王侃, 王怀军, 商珣超, 徒鹏佳. 基于改进特征子集区分度的行为识别特征选择方法 Feature Selection Method for Behavior Recognition Based on Improved Feature Subset Discrimination 计算机科学, 2020, 47(11A): 204-208. https://doi.org/10.11896/jsjkx.200100030 |
[12] | 张蕾,蔡明. 基于主题融合和关联规则挖掘的图像标注 Image Annotation Based on Topic Fusion and Frequent Patterns Mining 计算机科学, 2019, 46(7): 246-251. https://doi.org/10.11896/j.issn.1002-137X.2019.07.037 |
[13] | 刘洪麟,帅仁俊. 一种具有空间约束的快速神经风格迁移方法 Method of Fast Neural Style Transfer with Spatial Constraint 计算机科学, 2019, 46(3): 283-286. https://doi.org/10.11896/j.issn.1002-137X.2019.03.042 |
[14] | 单娜, 李龙杰, 刘昱阳, 陈晓云. 基于节点连接模式相关性的链接预测方法 Link Prediction Based on Correlation of Nodes’ Connecting Patterns 计算机科学, 2019, 46(12): 20-25. https://doi.org/10.11896/jsjkx.190700057 |
[15] | 黄梦婷, 张灵, 姜文超. 基于非负矩阵分解的短文本特征扩展与分类 Short Text Feature Expansion and Classification Based on Non-negative Matrix Factorization 计算机科学, 2019, 46(12): 69-73. https://doi.org/10.11896/jsjkx.190400107 |
|