Computer Science ›› 2013, Vol. 40 ›› Issue (7): 216-221.

Previous Articles     Next Articles

Fast Approach to Mutual Information Based Gene Selection with Fuzzy Rough Sets

XU Fei-fei,WEI Lai,DU Hai-zhou and WANG Wen-huan   

  • Online:2018-11-16 Published:2018-11-16

Abstract: Feature selection is an essential step to perform cancer classification with DNA microarrays.Rough set theory has already been successfully applied to gene selection.To avoid losing information by discretization of continuous gene expression data in rough set theory,the theory of fuzzy rough sets is applied to gene selection.A fuzzy rough attribute reduction algorithm based on mutual information was proposed and applied to gene selection.The cost of computation of the algorithm is too high to be carried out if the number of the selected genes is large.This paper raised an approximate replacement of computation of the mutual information,from both maximum relevance and maximum significance.The novel method improves the efficiency and decreases the complexity.Extensive experiments were conducted on three public gene expression datasets.The experimental results confirm the efficiency and effectiveness of the algorithm.

Key words: Feature selection,Fuzzy rough sets,Mutual information,Gene expression data

[1] Lander E S.Array of hope[J].Nature Genetics,1999,21(Suppl):3-4
[2] Ramaswamy S,Golub T R.DNA microarrays in clinical oncology[J].Journal of Clinical Oncology,2002,20(7):1932-1941
[3] Derisi J,Penland L,Brown P O,et al.Use of a cDNA microarray to analyse gene expression patterns in human cancer[J].Nature Genetics,1996,14(4):457-460
[4] Gloub T R,Slonim D K,Tamayo P,et al.Molecular classifica-tion of cancer:Class discovery and class prediction by gene expression monitoring[J].Science,1999,286(5439):531-537
[5] Khan J,Wei J S,Ringner M,et al.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks[J].Nature Medicine,2001,7(6):673-679
[6] Guyon I,Weston J,Barnhill S,et al.Gene selection for cancerclassification using support vector machines[J].Machine Lear-ning,2000,46(13):389-422
[7] Tibshirani R,Hastie T,Narasimhan B,et al.Diagnosis of multiple cancer types by shrunken centroids of gene expression[J].Proceedings of the National Academy of Science,2002,99(10):6567-6572
[8] Fleuret F.Fast binary feature selection with conditional mutual information[J].J.Mach.Learning Res,2004(5):1531-1555
[9] Hedenfalk I,Duggan D,Chen Y,et al.Gene-expression profiles in hereditary breast cancer[J].New England Journal oMedicine,2001,344(8):529-548
[10] Li X,Rao S,Zhang T,et al.An ensemble method for gene discovery based on DNA microarray data[J].Science in China(Series C),2004,47(5):396-405
[11] Tang E K,Suganthan P N,Yao X.Gene selection algorithms for microarray data based on least squares support vector machine[J].BMC Bioinformatics,2006(7)
[12] Cai Rui-chu,Hao Zhi-feng,Yang Xiao-wei,et al.An efficientgene selection algorithm based on mutual information[J].Nerocomputing,2009,72:991-999
[13] Kohavi R,John G H.Wrappers for feature subset selection[J].Artif.Intell.,1997,97(1/2):273-324
[14] Guyon I,Elisseeff A.An introduction to variable and feature selection[J].J.Mach.Learning Res.,2003(3):1157-1182
[15] deSouza M C R,deCarvalho F A T,Tenorio C P.Twopartitional-methods for interval-valued data using mahalanobis distances[J].Adv.Artif.Intell.Iberamia,2004,3315:454-463
[16] Chang C F,Wai K M,Patterton H G.Calculating the statistical significance of physical clusters of co-regulated genes in the genome:the role of chromatin in domain-wide gene regulation[J].Nucl.Acids Res.,2004,32(5):1798-1807
[17] Quinlan J R.Learning efficient classification procedures andtheir application to chess end games.Machine Learning:An artificial intelligence approach[M].San Francisco,CA:Morgan Kaufmann,1983:463-482
[18] Quinlan J R.C4.5:programs for machine learning[M].Morgan Kaufmann Publishers Inc.San Francisco,CA,USA,1993,9(2):132-136
[19] Langleyand P.Selection of relevant features in machine learning[C]∥Proceedings of A AAI Fall Symposium on Relevance.1994
[20] Wang Y,Tetko I V,HallMark A,et al.Gene selection from microarray data for cancer classification-a machine learning approach[J].Computation Biology and Chemistry,2005,9(1):37-46
[21] Guyon I,Weston J,Barnhill S,et al.Gene selection for cancerclassification using support vector machines[J].Machine Lear-ning,2002,6(1-3):389-422
[22] Pawlak Z.Rough sets[J].International Journal of Information and Computer Science,1982,11:341-356
[23] 李衍达,孙之荣.生物信息学基因和蛋白质分析的实用指南[M].北京:清华大学出版社,2000
[24] Li Ding-fang,Zhang Wen.Gene selection using rough set theory[C]∥Rough Sets and Knowledge Technology 2006(RSKT 2006).Lecture Notes in Artificial Intelligence,Chongqing,2006,4062:778-785
[25] Skowron A,Komorowski J,Pawlak Z,et al.Rough sets perspective on data and knowledge[M].Handbook of data mining and knowledge discovery.NewYork:Oxford University Press,2002
[26] Banerjee M,Mitra S,Banka H.Evolutinary-Rough Feature Selection in Gene Expression Data[J].IEEE Transaction on Systems,Man,and Cyberneticd,Part C:Application and Reviews,2007,7:622-632
[27] Momin B F,Mitra S,Datta G R.Reduct Generation and Classifcation of Gene Expression Data[C]∥Proceeding of First International Conference on Hybrid Information Technology (ICHICT06).2006:699-708
[28] Valdes J J,Barton A J.Gene discovery in leukemia revisited:a computational intelligence perspective[C]∥Proceedings of the 17th International Conference on Industrial & Engineering Applications of Artificial International Conference & Expert Systems.Springer Verlag,2004:118-127
[29] 苗夺谦.粗糙集理论中连续属性的离散化方法[J].自动化学报,2001,27(3):296-302
[30] 权光日,等.连续属性空间上的规则学习算法[J].软件学报,1999,10(11):1225-1232
[31] 叶东毅,黄翠微,赵斌.基于逼近精度的一个粗糙集属性约简算法[J].福州大学学报:自然科学版,2000,28(1):7-10
[32] Dubois D,Prade H.Rough fuzzy sets and fuzzy rough sets[J].International Journal of General Systems,1990,17:191-209
[33] Zadeh L A.模糊集合,语言变量及模糊逻辑[M].北京:北京科学出版社,1982
[34] Xu F F,Miao D Q,Wei L.Fuzzy-rough attribute reduction via mutual information with an application to cancer classification[J].Computers & Mathematics with Applications,2009,57(6):1010-1017
[35] Bhatt R B,Gopal M.On fuzzy-rough sets approach to feature selection[J].Pattern Recognition Letters,2005,26(7):965-975
[36] Hu Qing-hua,An Shuang,Yu Da-ren.Soft fuzzy rough sets for robust feature evaluation and selection[J].Information Sciences,2010,180(22):4384-4400
[37] Jensen R,Shen Qiang.Fuzzy-rough data reduction with ant colony optimization[J].Fuzzy Sets and Systems,2005,149(1):5-20
[38] Chen De-gang,Zhao Su-yun.Local reduction of decision system with fuzzy rough sets.Fuzzy Sets and Systems,2010,161(13):1871-1883
[39] Priness I,Maimon O,Ben-Gal I.Evaluation of gene-expression clustering via mutual information distance measure,BMC Bioinformatics,2007,8:111
[40] Chow T W S,Huang D.Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information[J].IEEE Trans.Neural Networks,2005,16(1):213-224
[41] 苗夺谦,王珏.粗集理论中概念与运算的信息表示[J].软件学报,1999,2:113-116
[42] 苗夺谦,胡桂荣.知识约简的一种启发式算法[J].计算机研究与发展,1999,36(6):681-684
[43] Peng H,Long F,Ding C.Feature selection based on mutual information:criteria of Max-Dependency,Max-Relevance,and Min-Redundancy[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(8):1226-1238
[44] Maji P,Paul S.Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data[J].Int.J.Approx.Reason,2011,52(3):408-426
[45] West M,Blanchette C,Dressman H,et al.Predicting the clinical status of human breast cancer by using gene expression profiles[C]∥Proceedings of the National Academy of Science.USA 98,2001(20):11462-11467

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!