计算机科学 ›› 2014, Vol. 41 ›› Issue (9): 243-247.doi: 10.11896/j.issn.1002-137X.2014.09.046

• 人工智能 • 上一篇    下一篇

一种基于互信息最大化的模型无关基因选择方法

魏莎莎,陆慧娟,安春霖,郑恩辉,金伟   

  1. 中国计量学院信息工程学院 杭州310018;中国计量学院信息工程学院 杭州310018;中国计量学院信息工程学院 杭州310018;中国计量学院机电工程学院 杭州310018;中国计量学院信息工程学院 杭州310018
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(61272315,60842009,60905034),浙江省自然科学基金(Y1110342, Y1080950),浙江省科技厅国际合作项目(2012C24030)资助

Model-free Gene Selection Method Based on Maximum Mutual Information

WEI Sha-sha,LU Hui-juan,AN Chun-Lin,ZHENG En-hui and JIN Wei   

  • Online:2018-11-14 Published:2018-11-14

摘要: 针对大规模基因芯片高维度的基因表达数据存在大量无关和冗余特征可能降低分类器性能的问题,提出了一种基于互信息最大化方法(MMI)和与遗传算法的模型无关的基因选择方法来将特征选择转化为全局优化问题,其中的适应度函数定义为类间距离与类内距离之比,适应程度高。为了评价算法的性能,采用3个数据集进行了实验,结果表明MMIGA-Selection取得了较好的效果,在每个数据集上获得了较高的5折交叉验证正确率。MMIGA-Selection主要有两个优点:一是可以有效减少冗余基因;二是模型无关性,选择得出的特征子集可直接用于其他类型的分类器,分类精度较高。

关键词: 互信息最大化,模型无关,遗传算法,基因选择

Abstract: The large number of irrelevant and redundant features in high dimensionality of large-scale gene chip expression data may reduce the performance of the classifiers.We proposed a model-free gene selection method based on the maximum mutual information (MMI) to transform feature selection into a global optimization problem.The fitness function was defined as the distance between the class and class in the ratio of the distance.In order to evaluate the performance of the algorithm,experiments were done in three data sets.Experimental results show that MMIGA-Selection obtains a better effect in every data set of the 5 fold cross validation accuracy.MMIGA-Selection has two main advantages.First,it can effectively reduce the redundant genes.Second,the model-free algorithm makes the feature subset directly apply to other types of classifier and obtains higher classification accuracy.

Key words: Maximum mutual information,Model-free,Genetic algorithm,Gene selection

[1] Kang H N,Chen I M,Wilson C S.Gene expression classifiers for relapse-free survival and minimal residual disease improve risk classification and outcome prediction in pediatric B-precursor acute lymphoblastic leukemia[J].Blood,2010,115:1394-1405
[2] 任江涛,黄焕宇,孙婧昊.基于相关性分析及遗传算法的高维数据特征选择[J].计算机应用,2006,26(6):1403-1405
[3] 裘国永,王娜,汪万紫.基于互信息和遗传算法的两阶段特征选择方法[J].计算机应用研究,2012,29(8):2903-2905
[4] Liu Hui-qing,Li Jin-yan.A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns[J].Genome Informatics,2002,13:51-60
[5] Zhao Zheng,Wang Lei,Liu Huan.Efficient spectral feature se-lection with minimum redundancy[J].Proceedings of the National Conference.2010,1:673-678
[6] 王凌,陈震,危水根,等.基于改进最大互信息法的MR切片图像配准[J].生物医学工程学杂志,2012,29(2):201-205
[7] 杨虎,马斌荣,任海萍,等.基于最大互信息的人脑MR-PET图像配准方法[J].北京生物医学工程,2001,20(4):246-251
[8] Michalewicz Z.A Modified Genetic Algorithm for Optimal Control Problems[J].Computers Math,1992,23(12):83-94
[9] Huang Jin-jie,Cai Yun-ze,Xu Xiao-ming.A Hybrid Genetic Algorithm for Feature Selection Wrapper Based on Mutual Information[J].Pattern Recognition Letters,2007,28(13):1825-1844
[10] 李建中,杨昆,高宏,等.考虑样本不平衡的模型无关的基因选择方法[J].软件学报,2006,17(7):1485-1493
[11] Lu Hui-juan,Chen Wu-tao,Ma Xiao-ping,et al.Model-free Gene Selection Using Genetic Algorithms [J].International Journal of Digital Content Technology and its Applications,2011,5(1):195-203
[12] 陆慧娟.基于基因表达数据的肿瘤分类算法研究[D].徐州:中国矿业大学,2012
[13] 王明怡.微阵列数据挖掘技术的研究[D].杭州:浙江大学,2004
[14] 刘庆和,梁正友.一种基于信息增益的特征优化选择方法[J].计算机工程与应用,2011,47(12):130-132
[15] Hu Y,Loizou P C.Speech enhancement based on waveletthresholding the multitaper Spectrum[J].IEEE Trans on Speech and Audio Processing,2004,12(1):59-67
[16] Wang Zhi-teng,Zhang Hong-jun,Hang Ying.Fire distribution optimization based on quantum immune genetic algorithm[C]∥2011 International Conference of Information Technology Computer Engineering and Management Sciences.IEEE,2011,1:95-98
[17] Jiang Feng-guo.The truss structural optimization design based on improved hybrid genetic algorithm[J].Advanced Materials Research,2011,163-167:2304-2308
[18] Holland J H.The psychology of vocational choice:A theory of personality types and model environments[M].Oxford,England:Oxford University Press,1965
[19] Bagley J D.The Behavior of Adaptive Systems which Employ Genetic and Correlation Algorithms[D].The univerty of Michigan,1967
[20] Holland P W.Discrete Multivariate Analysis:Theory and Practice[M].MIT Press,1975

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!