计算机科学 ›› 2015, Vol. 42 ›› Issue (5): 54-56.doi: 10.11896/j.issn.1002-137X.2015.05.011

• 2014' 数据挖掘会议 • 上一篇    下一篇

一种基于开方检验的特征选择方法

黄 源,李 茂,吕建成   

  1. 四川大学计算机科学学院 成都610065,四川大学计算机科学学院 成都610065,四川大学计算机科学学院 成都610065
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受教育部博士点基金资助

New Feature Selection Method Based on CHI

HUANG Yuan, LI Mao and LV Jian-cheng   

  • Online:2018-11-14 Published:2018-11-14

摘要: 开方检验是目前文本分类中一种常用的特征选择方法。该方法仅关注词语和类别间的关系,而没有考虑词与词之间的关联,因此选择出的特征集具有较大的冗余度。定义了词语的“剩余互信息”概念,提出了对开方检验的选择结果进行优化的方法。使用该方法可以得到既有很强表征性又有很高独立性的特征集。实验表明,该方法表现良好。

关键词: 文本分类,特征选择,开方检验,互信息

Abstract: CHI is a widely used feature selection method in text classification.This method only focuses on the relevance between features and classifications but ignores the relevance between feature and feature,resulting in a high redundancy.This paper proposed a concept about residual mutual information,and then CHI and residual mutual information were combined together to optimized the selective results.The experimental results indicate that the method is effective.

Key words: Text categorization,Feature selection,CHI,Mutual information

[1] 胡洁.高维数据特征降维研究综述[J].计算机应用研究,2008,25(9):2601-2606
[2] John H,Kohavi R,Pfleger K.Irrelevant feature and the subset selection problem[C]∥Proc.of the 11th Int.Conf.on Machine Learning,1994.San Francisco:Morgan Kaufmann Publishers,1994:121-129
[3] Jasper.文本分类入门(十)特征选择算法之开方检验.2008-08-31[2014-01-18].http://www.blogjava.net/zhenandaci/archive/2008/08/31/225966.html
[4] Yu Lei,Liu Huan.Efficient Feature Selection via Analysis ofRelevance and Redundancy[J].Journal of Machine Learning Research,2004,10(5):1205-1224
[5] Battiti R.Using mutual information for selecting features in supervised neural net learning[J].IEEE Trans.Neural Network,1994,5(4):537-550
[6] Estevez P,Tesmer M,Perez C,et al.Normalized mutual information feature selection[J].IEEE Trans.Neural Network,2009,20(2):189-201
[7] Sun X,Liu Y,Xu M,et al.Feature selection using dynamicweights for classification[J].Knowledge-Based Systems,2013,37:541-549
[8] Li B,Chow T W S,Huang D.A novel feature selection methodand its application[J].Journal of Intelligent Information Systems,2013,41(2):235-268
[9] Lee S,Park Y T,d’Auriol B J.A novel feature selection method based on normalized mutual information[J].Applied Intelligence,2012,37(1):100-120
[10] Aliferis C F,Statnikov A,Tsamardinos I,et al.Local causal and markov blanket induction for causal discovery and feature selection for classification part i:Algorithms and empirical evaluation[J].The Journal of Machine Learning Research,2010,11:171-234
[11] Lv Jian-cheng,Tan K K,Zhang Yi,et al.A family of fuzzy lear-ning algorithms for robust principal component analysis neural networks[J].IEEE Transactions on Fuzzy Systems,2010,18(1):217-226
[12] Lv Jian-cheng,Zhang Yi,Tan K K.Global Convergence of Oja’s PCA Learning Algorithm with a Non-Zero-Approaching Adaptive Learning Rate[J].Theoretical Computer Science,2006,367(3):286-307

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!