Computer Science ›› 2015, Vol. 42 ›› Issue (7): 250-253.doi: 10.11896/j.issn.1002-137X.2015.07.053

Previous Articles     Next Articles

Two Novel Tree Structure-based Methods for Gene Selection

XIE Qian-qian, LI Ding-fang and ZHANG Wen   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Cancer diagnosis is one of the most significant topics in bioinformatics.For the microarray datasets,selecting a small subset of genes from thousands of genes (named gene selection) is helpful for accurate identification and treatment of cancerous tumors.Motivated by the instinct of random forests measuring variable importance (named ‘PBM’),we proposed two novel methods based on the tree structures for gene selection,namely FBM and ABM.They respectively make use of gene frequency and average scores yielded by a great number of decision trees,which are constructed on the microarray datasets.In computational experiments,the optimal gene subsets are determined by three methods,and random-forest classifiers are built on subsets to evaluate the performance of gene selection methods.AUC scores of PBM are greater than 0.900 when selecting 26 genes for leukemia dataset and 48 genes for colon cancer dataset,while the classifiers with FBM and ABM can achieve the AUC score of 0.989 for leukemia dataset and AUC score of 0.900 for colon cancer dataset respectively with top ten genes selected.In addition,the proposed methods have better perfor-mance than the developed methods (such as mRMR and ECRP),which play the critical roles in the accurate diagnosis and treatment of cancer.

Key words: Classification,Gene selection,Random forests

[1] Xing E P,Jordan M I,Karp R M.Feature selection for high-dimensional genomic microarray data[C]∥Proceedings of the 15th International Conference on Machine Learning.2001:601-608
[2] Andrew Y N.On feature selection:learning with exponentially many irrelevant features as training examples[C]∥Proceedings of the 15th International Conference on Machine Learning.1998:404-412
[3] Bhattacharjee A,Richards W G,Staunton J,et al.Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].Proceedings of the National Academy of Sciences of the United States of America,2001,98(24):13790-13795
[4] Golub T R,Slonim D K,Tamayo P,et al.Molecular classifica-tion of cancer,class discovery and class prediction by gene expression monitoring[J].Science,1999,286(5439):531-537
[5] Faivishevsky L,Goldberger J.Unsupervised feature selectionbased on non-parametric mutual information [C]∥2012 IEEE International Workshop on Machine Learning for Signal Proceeding (MLSP).IEEE,2012:1-6
[6] 冶晓隆,兰巨龙,郭通.基于PCA和禁忌搜索的网络流量特征选择算法[J].计算机科学,2014,41(1):187-191 Ye Xiao-long,Lan Ju-long,Guo Tong.Algorithm of Network Traffic Feature Selection Based on PCA and Tabu Search[J].Computer Science,2014,41(1):187-191
[7] Zhu Qiu-sha,Lin Lin,Shyu Mei-ling,el al.Feature Selection Using Correlation and Reliability Based Scoring Metric for Video Semantic Detection[C]∥IEEE Fourth International Conference on Semantic Computing.2010:462-469
[8] Ogura H,Amano H,Kondo M.Comparison of metrics for feature selection in imbalanced text classification [J].Expert Systems with Applications,2011,38(5):4978-4989
[9] Saeys Y,Inza I,Larranaga P.A review of feature selection techni-ques in bioinformatics[J].Bioinformatics,2007,23(19):2507-2517
[10] Amiri F,Yousefi M R,Lucas C,et al.Mutual information-based feature selection for intrusion detection systems [J].Journal of Network and Computer Applications,2011,34(4):1184-1199
[11] 于化龙,顾国昌,赵靖,等.基于DNA微阵列数据的癌症分类问题研究进展[J].计算机科学,2010,37(10):16-32 Yu Hua-long,Gu Guo-chang,Zhao Jing,et al.State of the Art on Cancer Classification Problems Based on DNA Microarray Data[J].Computer Science,2010,37(10):16-32
[12] Liu Jing-jing,Cai Wen-sheng,Shao Xue-guang.Cancer classification based on microarray gene expression data using a principal component accumulation method [J].Science China Chemistry,2011,54(5):802-803
[13] Breiman L.Random forests[J].Machine Learning,2001,45(1):5-32
[14] Breiman L,Friedman J H,Olshen R A,et al.Classification and Regression Trees [M].Chapman and Hall/CRC,1984
[15] Breiman L.Bagging predictors [J].Machine Learning,1996,24(2):123-140
[16] Alon U,Barkai N,Notterman D A,et al.Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays[J].Proceedings of the National Academy of Sciences of the United States of America,1999,96(12):6745-6750
[17] Ding C,Peng H.Minimum redundancy feature selection frommicroarray gene expression data [J].J Bioinform Comput Biol,2005,3(2):185-205
[18] Moon H,Ahn H,Kodell R L,et al.Ensemble methods for classification of patients for personalized medicine with high-dimensional data [J].Artif Intell Med,2007,41(3):197-207
[19] Yu L.Feature selection for genomic data analysis[M]∥Computational methods of feature selection.Chapman & Hall,2008:337-353
[20] Au W-H,Chan K C C,Wong A K C,et al.Attribute clustering for grouping,selection,and classification of gene expression data[J].IEEE/ACM Trans Computational Biology and Bioinforma-tics,2005,2(2):83-101
[21] Yang Kun,Cai Zhi-peng,Li Jian-zhong,et al.A stable gene selection in microarray data analysis[J].BMC Bioinformatics,2006,7:228

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!