计算机科学 ›› 2013, Vol. 40 ›› Issue (Z6): 125-128.

• 数据存储与挖掘 • 上一篇    下一篇

基于相容粗糙集的改进的基因特征选择方法

焦娜   

  1. 华东政法大学信息科学与技术系 上海201620
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受上海高校青年教师培养资助

Evolutionary Gene Selection Based on Tolerance Rough Set Theory

JIAO Na   

  • Online:2018-11-16 Published:2018-11-16

摘要: 在基因表达数据中,有效的基因选择方法是癌症基因数据研究的重要内容。粗糙集是一个去掉冗余特征的有效工具。由于基因表达数据的连续性,为了避免运用粗糙集方法所必须的离散化过程带来的信息丢失,将相容粗糙集应用于基因的特征选择,提出基于相容粗糙集的基因特征选择方法,并在此方法基础上进一步对粗糙集的边界域进行研究,提出了基于相容粗糙集的改进的基因特征选择方法。在两个标准的基因表达数据上进行实验,结果表明与传统的基因特征选择方法相比,所提方法能够有效提高分类精度。

关键词: 粗糙集,相容关系,基因特征选择,基因表达数据,癌症分类

Abstract: Gene selection is to select the most informative genes from the whole gene set,which is a key step of the discriminant analysis of microarray data.Rough set theory is an efficient mathematical tool for further reducing redundancy.The main limitation of traditional rough set theory is the lack of effective methods for dealing with real-valued data.However,gene expression data sets are always continuous.This has been addressed by employing discretization methods,which may result in information loss.This paper investigates one approach combining feature ranking together with features selection based on tolerance rough set theory.Moreover,this paper explores the other method which can utilize the information contained within the boundary region to improve classification accuracy in gene expression data.Compared with gene selection algorithm based on rough set theory,the proposed methods are more effective for selecting high discriminative genes in cancer classification task.

Key words: Rough set theory,Tolerance relation,Gene selection,Gene expression data,Cancer classification

[1] Tibshirani R,Hastie T,Narashiman B,et al.Diagnosis of multiple cancer types by shrunken centroids of gene expression[C]∥Nat’l Academy of Sciences.USA,2002:6567-6572
[2] Kohavi R,John G H.Wrappers for feature subset selection[J].Artificial Intelligence,1997:273-324
[3] Banerjee M,Mitra S,Banka H.Evolutinary-rough feature selection in gene expression Data[J].IEEE Transaction on Systems,Man,and Cyberneticd,Part C:Application and Reviews,2007,37:622-632
[4] Momin B F,Mitra S,Datta Gupta R.Reduct generation and cla-ssification of gene expression data[C]∥Proceeding of First International Conference on Hybrid Information Technology(ICHICT06).New York,2006:699-708
[5] Pawlak Z.Rough sets[J].International Journal of Information Computer Science,1982,11(5):341-356
[6] Dubois D,Prade H.Putting rough sets and fuzzy sets together[J].Intelligent Decision Support,1992:203-232
[7] Jensen R,Shen Q.Tolerance-based and fuzzy-rough feature selection[C]∥Proceedings of the 16th International Conference on Fuzzy Systems(FUZZ- IEEE''07).2007:877-882
[8] Liang J Y,Li R.Distance:A more comprehensible perspectivefor measures in rough set theory[J].Knowledge-Based Systems,2012,27:126-136
[9] Parthaláin N M,Shen Q.Exploring the boundary region of tolerance rough sets for feature selection[J].Pattern Recognition,2009,42:655-667
[10] Yao Y Y,Yao B X.Covering based rough set approximations[J].Information Sciences,2012,200:91-107
[11] 苗夺谦,胡桂荣.知识约简的一种启发式算法[J].计算机研究与发展,1999,36(6):681-684
[12] Yang X B,Xie J,Song X N,et al.Credible rules in incomplete decision system based on descriptors[J].Knowledge-Based Systems,2009,22:8-17
[13] Shen Q,Chouchoulas A.A rough-fuzzy approach for generating classification rules[J].Pattern Recognition,2002,5:2425-2438
[14] Ou Yang Y P,Shieh H M,et al.Combined rough sets with flow graph and formal concept analysis for business aviation decision-making[J].Journal of Intelligent Information Systems,2011,36(3):347-366
[15] 王国胤.Rough集理论与知识获取[M].西安:西安交通大学出版社,2001
[16] 苗夺谦.粗糙集理论中连续属性的离散化方法[J].自动化学报,2001,27(3):296-302
[17] Grzymala-Busse J W.Discretization of numerical attributes[M]. Klsgen W,Zytkow J,ed.Handbook of Data Mining and Knowledge Discovery,Oxford University Press,2002:218-225
[18] Golub T R,Slonim D K,Tamayo P,et al.Molecular classification of cancer:class discovery and class prediction by gene expression monitoring[J].Science,1999,286:531-537
[19] Wang L P,Feng C,Xie X.Accurate cancer classification usingexpressions of very few genes[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2007,4:40-53
[20] Grzymala-Busse J W,Grzymala-Busse W J.Handling missing attribute values[M].Maimon O,Rokach L,ed.Handbook of Data Mining and Knowledge Discovery,2005:37-57

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!