Computer Science ›› 2019, Vol. 46 ›› Issue (7): 300-307.doi: 10.11896/j.issn.1002-137X.2019.07.046

• Interdiscipline & Frontier • Previous Articles     Next Articles

Cancer Classification Prediction Model Based on Correlation and Similarity

ZHANG Xue-fu,ZENG Pan,JIN Min   

  1. (College of Computer Science and Electronic Engineering,Hunan University,Changsha 410006,China)
  • Received:2018-06-15 Online:2019-07-15 Published:2019-07-15

Abstract: Cancer diagnosis based on empirical histopathology often has a high rate of misdiagnosis.Analyzing and studying cancer from the gene level is one of the important ways to improve the accuracy of cancer classification prediction at this stage.Biological studies have shown that the related genes of the same kind of cancer share common functional characteristics.Based on this,this paper proposes an integrated method of correlation and similarity for cancer classification prediction:First,on the one hand,statistical analysis of differential expression of genes The use of mutual information methods to perform correlation calculations on gene expression profiles.On the other hand,the similarity analysis between genes was performed on the basis of biological mechanisms,and the protein interaction network and GO data were genetically performed based on topological similarity and semantic similarity,respectively.The functional similarity calculation between the two,the combination of the two,that is,the feature set is selected by simultaneously maximizing the relevance and similarity of the target set;then the diversity of the data set is sampled by Bootstrap method,and the selected feature set in the front Based on the above,we use multiple different machine learning algorithms to train a number of differently differentiated prediction models.Finally,the multiple models are used to classify the test samples and obtain the final classification results through the decision model.The classification prediction of four differentcancerdatasets in GEO was compared with the latest research methods,and the classification accuracy on each dataset was improved by about 5%,which is up to 10% higher than that of IG/SGA methods.Increased accuracy.The experimental results show that the method of combining relevance and similarity can effectively improve the accuracy of cancer classification prediction.Selecting the obtained characteristic genes is beneficial for revealing biological significance,and the advantages of multiple algorithms can be complemented to solve the problem that the application scope of a single classification algorithm is limited.problem.

Key words: Cancer classification, Correlation, Diversity sampling, Multiple algorithms and multiple models, Semantic similarity, Topological similarity

CLC Number: 

  • TP391.9
[1]SONG N F.Design and Analysis of Ensemble Classifier for Gene Expression Data of Cancer[J].Wireless Internet Technology,2016(7):71-72.(in Chinese)<br /> 宋年丰.癌症基因表达数据的集成分类器设计与分析[J].无线互联科技,2016(7):71-72.<br /> [2]CHEN J,ZHANG M,SHAO X G.Gene selection and cancer classification based on Monte Carlo and non-negative matrix factorization:CN 104462817 B[P].2017.(in Chinese)<br /> 陈晶,张苗,邵学广.基于蒙特卡洛和非负矩阵因子分解的基因选择和癌症分类方法:CN 104462817 B[P].2017.<br /> [3]NGUYEN T,KHOSRAVI A,CREIGHTON D,et al.Hidden Markov models for cancer classification using gene profiles[J].Information Sciences,2015,316(C):293-307.<br /> [4]LI Y,LI J.Disease gene identification by random walk on multigraphs mergingheterogeneous genomic and phenotype data[J].Bmc Genomics,2012,13(7):1-12.<br /> [5]LIU B,JIN M,PAN Z.Prioritization of candidate disease genes by combining topological similarity and semantic similarity[J].Journal of Biomedical Informatics,2015,57(C):1-5.<br /> [6]LIU G,WONG L,CHUA H N.Complex discovery from weighted PPI networks[J].Bioinformatics,2009,25(15):1891.<br /> [7]WANG H,JING X,NIU B.A discrete bacterial algorithm for feature selection in classification of microarray gene cancer data[J].Knowledge-Based Systems,2017,126(C):8-19.<br /> [8]GEORGE V S,RAJ C.Review On Feature Selection Techniques And The Impact Of Svm For Cancer Classification Using Gene Expression Profile[J].International Journal of Computer Scien-ce & Engineering Survey,2011,2(3):16-27.<br /> [9]BOUAZZA S H,HAMDI N,ZEROUAL A,et al.Gene--based cancer classification through feature selection with KNN and SVM classifiers[C]∥Intelligent Systems and Computer Vision.IEEE,2015:1-6.<br /> [10]NIKUMBH S,GHOSH S,JAYARAMAN V K.Biogeography-based informative gene selection and cancer classification using SVM and Random Forests[C]∥Evolutionary Computation.IEEE,2012:1-6.<br /> [11]LI J,ZHAO Z,LIU Y,et al.A Comparative Study on Machine Classification Model in Lung Cancer Cases Analysis[C]∥International Conference on Frontier Computing.Singapore:Sprin-ger,2016:343-357.<br /> [12]NAGARAJAN R,UPRETI M.An ensemble predictive mode- ling framework for breast cancer classification[J].Methods,2017,131.<br /> [13]ZHOU M,JIN M.Holographic Ensemble Forecasting Method for Short-Term Power Load[J].IEEE Transactions on Smart Grid,2017,PP(99):1-1.<br /> [14]GOH K I,CUSICK M E,VALLE D,et al.The human disease network[J].Proceedings of the National Academy of Sciences of the United States of America,2007,104(21):8685-8690.<br /> [15]ALZUBAIDI A,COSMA G,BROWN D,et al.Breast Cancer Diag- nosis Using a Hybrid Genetic Algorithm for Feature Selection Based on Mutual Information[C]∥International Conference on Interactive Technologies and Games.IEEE,2016.<br /> [16]REAL R,VARGAS J M.The Probabilistic Basis of Jaccard’s Index of Similarity[J].Systematic Biology,1996,45(3):380-385.<br /> [17]KOMM D,KR LOVICˇ R,M MKE T.On the Advice Complexity of the Set Cover Problem[C]∥International Computer Science Symposium in Russia.Berlin:Springer,2012:241-252.<br /> [18]WANG X,GULBAHCE N,YU H.Network-based methods for human disease gene prediction[J].Briefings in Functional Genomics,2011,10(5):280-293.<br /> [19]WU X,PANG E,LIN K,et al.Improving the Measurement of Semantic Similarity between Gene Ontology Terms and Gene Products:Insights from an Edge- and IC-Based Hybrid Method[J].Plos One,2013,8(5):e66745.<br /> [20]SZKLARCZYK D,FRANCESCHINI A,WYDER S,et al. STRING v10:protein-protein interaction networks,integrated over the tree of life[J].Nucleic Acids Research,2015,43:D447.<br /> [21]VANITHA C D A,DEVARAJ D,VENKATESULU M.Multiclass cancer diagnosis in microarray gene profile using mutual information and Support Vector Machine[J].Intelligent Data Analysis,2016,20(6):1425-1439.<br /> [22]DING C,PENG H.Minimum Redundancy Feature Selection from Microarray Gene Expression Data[J].Journal of Bioinformatics & Computational Biology,2005,3(2):185-205.<br /> [23]JOHNSON R W.An introduction to the bootstrap[J].Teaching Statistics,2001,23(2):49-54.<br /> [24]BARRETT T,SUZEK T O,TROUP D B,et al.NCBI GEO:mining millions of profiles—database and tools[J].Nucleic Acids Research,2005,33(Database Issue):D562.<br /> [25]TIMALSINA P,CHARLES K,MONDAL A M.STRING PPI Score to Characterize Protein Subnetwork Biomarkers for Human Diseases and Pathways[C]∥IEEE International Confe-rence on Bioinformatics and Bioengineering.IEEE,2014:251-256.<br /> [26]SALEM H,ATTIYA G,EL-FISHAWY N.Classification of human cancer diseases by gene profiles[J].Applied Soft Computing,2017,50:124-134.<br /> [27]CHEN K H,WANG K J,WANG K M,et al.Applying particle swarm optimization-based decision tree classifier forcancer classification on gene data[J].Applied Soft Computing,2014,24(C):773-780.
[1] LIU Jie-ling, LING Xiao-bo, ZHANG Lei, WANG Bo, WANG Zhi-liang, LI Zi-mu, ZHANG Hui, YANG Jia-hai, WU Cheng-nan. Network Security Risk Assessment Framework Based on Tactical Correlation [J]. Computer Science, 2022, 49(9): 306-311.
[2] CHEN Ying, HAO Ying-guang, WANG Hong-yu, WANG Kun. Dynamic Programming Track-Before-Detect Algorithm Based on Local Gradient and Intensity Map [J]. Computer Science, 2022, 49(8): 150-156.
[3] SHEN Xiang-pei, DING Yan-rui. Multi-detector Fusion-based Depth Correlation Filtering Video Multi-target Tracking Algorithm [J]. Computer Science, 2022, 49(8): 184-190.
[4] WU Su-jie, ZHOU Jie, WANG Xue-ying, LYU Zhi-kang, SHAO Gen-fu. Study on Characteristics of Millimeter-wave MIMO Channel in Rainfall Environment [J]. Computer Science, 2022, 49(7): 297-303.
[5] YANG Xiao, WANG Xiang-kun, HU Hao, ZHU Min. Survey on Visualization Technology for Equipment Condition Monitoring [J]. Computer Science, 2022, 49(7): 89-99.
[6] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[7] HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin. Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning [J]. Computer Science, 2022, 49(5): 33-42.
[8] ZHAO Geng, WANG Chao, MA Ying-jie. Study on PAPR Reduction Based on Correlation of Chaotic Sequences [J]. Computer Science, 2022, 49(5): 250-255.
[9] LIU Yi, MAO Ying-chi, CHENG Yang-kun, GAO Jian, WANG Long-bao. Locality and Consistency Based Sequential Ensemble Method for Outlier Detection [J]. Computer Science, 2022, 49(1): 146-152.
[10] LUO Yue-tong, WANG Tao, YANG Meng-nan, ZHANG Yan-kong. Historical Driving Track Set Based Visual Vehicle Behavior Analytic Method [J]. Computer Science, 2021, 48(9): 86-94.
[11] FENG Xia, HU Zhi-yi, LIU Cai-hua. Survey of Research Progress on Cross-modal Retrieval [J]. Computer Science, 2021, 48(8): 13-23.
[12] SUN Lin, PING Guo-lou, YE Xiao-jun. Correlation Analysis for Key-Value Data with Local Differential Privacy [J]. Computer Science, 2021, 48(8): 278-283.
[13] WANG Sheng, ZHANG Yang-sen, CHEN Ruo-yu, XIANG Ga. Text Matching Method Based on Fine-grained Difference Features [J]. Computer Science, 2021, 48(8): 60-65.
[14] ZHOU Jia-li, FENG Yuan-yuan, WU Min, WU Chao. Stereo Track Blocks Coding System with Rotational Invariance [J]. Computer Science, 2021, 48(8): 175-184.
[15] LUO Jing-jing, TANG Wei-zhen, DING Ji-ting. Research of ATC Simulator Training Values Independence Based on Pearson Correlation Coefficient and Study of Data Visualization Based on Factor Analysis [J]. Computer Science, 2021, 48(6A): 623-628.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!