计算机科学 ›› 2015, Vol. 42 ›› Issue (7): 52-56.doi: 10.11896/j.issn.1002-137X.2015.07.012

• 2014’全国理论计算机科学年会 • 上一篇    下一篇

基于信息增益的多标签特征选择算法

李 玲,刘华文,徐晓丹,赵建民   

  1. 浙江师范大学数理与信息工程学院 金华321004,浙江师范大学数理与信息工程学院 金华321004;中国科学院数学与系统科学研究院 北京100055,浙江师范大学数理与信息工程学院 金华321004,浙江师范大学数理与信息工程学院 金华321004
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(61100119,0,61272468,61170108,9),模式识别国家重点实验室开放课题基金(201204214),中国博士后基金(2013M530072),浙江省自然科学基金项目(LY14F020012),浙江省教育厅项目(Y201328291)资助

Multi-label Feature Selection Algorithm Based on Information Gain

LI Ling, LIU Hua-wen, XU Xiao-dan and ZHAO Jian-min   

  • Online:2018-11-14 Published:2018-11-14

摘要: 多标签特征选择是一种提高多标签分类器性能的技术。针对目前这类技术在给出合理特征子集合时无法同时兼顾计算复杂度和标签间的相关性的问题,提出一种基于信息增益的多标签分类算法。该算法假设特征之间相互独立,首先使用单个特征与整个标签集合之间的信息增益来度量这两者的关联程度,再根据阈值删除不相关的特征以得到最优特征子集合。实验表明,该算法能有效地提高多标签分类器的分类性能。

关键词: 数据挖掘,多标签分类,特征选择,信息增益

Abstract: Multi-label feature selection is a kind of technology which is used to improve the performance of multi-label classifiers.However,the existing multi-label feature selection methods fail to make a tradeoff between the possible dependence among the labels and computational complexity in the process of obtaining reasonable feature subsets.Therefore,a novel multi-label feature selection algorithm based on information gain was proposed in the essay.It assumes that the features are independent with each other.The proposed method firstly uses information gain between a single feature and a set of labels to measure their correlation degree,and then removes the irrelevant and redundant features according to a threshold value.The experimental results show that the proposed algorithm can more effectively promote the performance of multi-label classifiers.

Key words: Data mining,Multi-label learning,Feature selection,Information gain

[1] Elisseeff A,Weston J.A kernel method for multi-labelled classification[C]∥NIPS.2001:681-687
[2] Lewis D D,Yang Y,et al.A new benchmark collection for text categorization research[J].Journal of Machine Learning Research,2004,5:361-397
[3] Boutell M R,Luo J,et al.Learning multi-label scene clssifycation[J].Pattern Recognition,2004,37(9):1757-1771
[4] Tsoumakas G,Katakis I,et al.Mining multi-label data[M]∥Data Mining and Knowledge Discovery Handbook.New York:Springer US,2010:667-685
[5] Liu Hua-wen,Li Min-shuo,et al.An effective feature selection method using dynamic information criterion[J].In Artificial Intelligence and Computational Intelligence,2011,7002:450-455
[6] 刘华文.基于信息熵的特征选择算法研究[D].吉林:吉林大学,2010 Liu Hua-wen.A Study on Feature Selection Algorithms using Information Entropy[D].Jilin:Jilin University,2010
[7] Doquire G,Verleysen M.Mutual information-based feature selection for multilabel classification[J].Neurocomputing,2013,122:148-155
[8] Lee J,Lim H,Kim D W.Approximating mutual informatioon for multi-label feature selection[J].Electronics Letters,2012,48(15):929-930
[9] Spolar N,Cherman E A,et al.Filter approach feature selection methods to support multi-label learning based on reliefF and information gain[M]∥Advances in Artificial Intelligence-SBIA 2012.Springer Berlin Heidelberg,2012:72-81
[10] 张振海,李士宁,等.一类基于信息熵的多标签特征算法[J].计算机研究与发展,2013,50(6):1177-1184 Zhang Zhen-hai,Li Shi-ning,et al.Multi-Label Feature Selection Algorithm Based on Information Entropy[J].Journal of Computer Research and Development,2013,0(6):1177-1184
[11] 张永波,游录金,等.基于模拟退火的多标记数据特征选择[J].计算机工程与设计,2011,32(7):2494-2500 Zhang Yong-bo,You Lu-jin,et al.Feature selection for multi-label data by using simulated annealing[J].Computer Engineering and Design,2011,2(7):2494-2500
[12] Shao huan,Li Guo-zheng,et al.Symptom selection for multi-label data of inquiry diagnosis in traditional chinese medicine[J].Science China Information Sciences,2011,56(5):1-13
[13] You Min-yu,Liu Jia-ming,et al.Embedded feature selection for multi-label classification of music emotions[J].International Journal of Computational Intelligence Systems,2012,5(4):668-678
[14] Liu Hua-wen,Sun Ji-gui,et al.Feature selection with dynamic mutual information[J].Pattern Recognition,2009,42(7):1330-1339
[15] Doquire G,Verleysen M.Feature selection for multi-label classification problems[M].Advances in Computational Intelligence,2011:9-16
[16] Cover T M,Thomas J A.Elements of information theory [M].John Wiley & Sons,2012
[17] McGill W J.Multivariate information transmission[J].Psy-chometrika,1954,19(2):97-116
[18] Brown G.A new perspective for information theoretic featureselection[C]∥International Conference on Artificial Intelligence and Statistics.2009:49-56
[19] Chen Wei-zhu,Yan Jun,et al.Document transformation formulti-label feature selection in text categorization[C]∥Seventh IEEE International Conference on IEEE.2007:451-456
[20] Zhang Min-ling.ML-RBF:RBF neural networks for multi-label learning[J].Neural Processing Letters,2009,29(2):61-74
[21] Zhang Min-ling,Zhou Zhi-hua.Multi-label learning by instance differentiation[C]∥AAAI.2007,7:669-674
[22] Zhang Min-ling,Zhou Zhi-hua.ML-kNN:A lazy learning ap-proach to multi-label learning [J].Pattern Recognition,2007,40(7):2038-2048

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!