计算机科学 ›› 2017, Vol. 44 ›› Issue (10): 289-295, 317.doi: 10.11896/j.issn.1002-137X.2017.10.052

• 人工智能 • 上一篇    下一篇

基于标记权重的多标记特征选择算法

林梦雷,刘景华,王晨曦,林耀进   

  1. 闽南师范大学数学与统计学院 漳州363000,厦门大学自动化系 厦门361000,闽南师范大学计算机学院 漳州363000,闽南师范大学计算机学院 漳州363000
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金(61303131,1,61603173 ),福建省自然科学基金项目(2013J01028),福建省高校新世纪优秀人才支持计划资助

Multi-label Feature Selection Algorithm Based on Label Weighting

LIN Meng-lei, LIU Jing-hua, WANG Chen-xi and LIN Yao-jin   

  • Online:2018-12-01 Published:2018-12-01

摘要: 在多标记学习中,特征选择是解决多标记数据高维性的有效手段。每个标记对样本的可分性程度不同,这可能会为多标记学习提供一定的信息。基于这一假设,提出了一种基于标记权重的多标记特征选择算法。该算法首先利用样本在整个特征空间的分类间隔对标记进行加权,然后将特征在整个标记集合下对样本的可区分性作为特征权重,以此衡量特征对标记集合的重要性。最后,根据特征权重对特征进行降序排列,从而得到一组新的特征排序。在6个多标记数据集和4个评价指标上的实验结果表明,所提算法优于一些当前流行的多标记特征选择算法。

关键词: 特征选择,标记权重,分类间隔,多标记分类

Abstract: In multi-label learning,each sample is described as a feature vector and simultaneously associated with multiple class labels.Feature selection is able to remove irrelevant and redundant features,which is an efficient measure of overcoming the curse of dimensionality for multi-label data.Label has different separability with sample,which may provide some usefull informations for multi-label learning.Based on this assumption,a multi-label feature selection algorithm based on label weighting was proposed in this paper.First,the margin of sample in all feature space is calculated and it is used as label weighting.Then,the distinguishability of feature is adopted based on label set for calculating feature weighting,which will measure the importance degree of feature.Finally,all features are sorted by the value of feature weighting.Experiment was conducted on four multi-label datasets,and four evaluation criteria were used to mea-sure the effectiveness of our method.Experimental results show that the proposed algorithm is superior to several state-of-the-art multi-label feature selection algorithms.

Key words: Feature selection,Label weighting,Classification margin,Multi-label classification

[1] SCHAPIRE R,SINGER Y.BoosTexter:A boosting-based system for text categorization [J].Machine Learning,2000,39(2/3):135-168.
[2] ZHANG M,ZHOU Z.Multi label neural networks with applications to functional genomics and text categorization [J].IEEE Transactions on Knowledge and Data Engineering,2006,18(10):1338-1351.
[3] BOUTELL M,LUO J,SHEN X,et al.Learning multi-labelscene classification [J].Pattern Recognition,2004,37(9):1757-1771.
[4] ZHENG X Y,ZHANG H X.Multiple Label Approach Based on Local Correlation of Neighbors[J].Computer Science,2014,41(2):123-126.(in Chinese) 郑希源,张化祥.基于局部近邻相关性的多标记算法[J].计算机科学,2014,41(2):123-126.
[5] HE Z F,YANG M,LIU H D.Joint Learning of Multi-Label Classification and Label Correlations[J].Journal of Software,2014,25(9):1967-1981.(in Chinese) 何志芬,杨明,刘会东.多标记分类和标记相关性的联合学习[J].软件学报,2014,25(9):1967-1981.
[6] HOTELLING H.Relations between two sets of variates [J].Biometrika,1936,28(3/4):321-377.
[7] ZHANG Y,ZHOU Z.Multi-Label dimensionality reduction via dependence maximization [J].Transactions on Knowledge Discovery from Data,2010,4(3):21-41.
[8] YU K,YU S,TRESP V.Multi-label informed latent semanticindexing [C]∥Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York,NY:ACM,2005:258-265.
[9] LIU J H,LIN M L,WANG C X,et al.Multi-label Feature Selection Algorithm Based on Local Subspace[J].Pattern Recognition and Artificial Intelligence,2016,29(3):240-251.(in Chinese) 刘景华,林梦雷,王晨曦,等.基于局部子空间的多标记特征选择算法[J].模式识别与人工智能,2016,29(3):240-251.
[10] LIN Y,HU Q,LIU J,et al.Multi-label feature selection based on max-dependency and min-redundancy [J].Neurocomputing,2015,168(c):92-103.
[11] LIN Y,HU Q,LIU J.et al.Multi-Label Feature Selection Based on Neighborhood Mutual Information [J].Applied Soft Computing,2016,38(c):244-256.
[12] WANG C X,LIN M L,LIU J H,et al.Multi-label feature selection via fusing feature ranking[J].Computer Engineering and Applications,2016,52(17):93-100.(in Chinese) 王晨曦,林梦雷,刘景华,等.融合特征排序的多标记特征选择算法[J].计算机工程与应用,2016,52(17):93-100.
[13] ZHANG L,HU Q,DUAN J,et al.Multi-label Feature Selection with Fuzzy Rough Sets [M]∥Rough Sets and Knowledge Technology.Springer International Publishing,2014:121-128.
[14] DUAN J,HU Q H,ZHANG L J,et al.Feature Selection for Multi-Label Classification Based on Neighborhood Rough Set[J].Journal of Computer Research and Development,2015,52(1):56-65.(in Chinese) 段洁,胡清华,张灵均,等.基于邻域粗糙集的多标记分类特征选择算法[J].计算机研究与发展,2015,52(1):56-65.
[15] SPOLAOR N,CHERMAN E,MONARD M.Using ReliefF for multi-label feature selection[C]∥Conferencia Latinoamericana de Informática.2011:960-975.
[16] SPOLAOR N,CHERMAN E,MONARD M,et al.A comparison of multi-label feature selection methods using the problem transformation approach[J].Electronic Notes in Theoretical Computer Science,2013,292:135-151.
[17] SPOLAOR N,CHERMAN E,MONARD M,et al.ReliefF for multi-label feature selection[C]∥2013 Brazilian Conference on Intelligent Systems (BRACIS).IEEE,2013:6-11.
[18] REYES O,MORELL C,VENTURA S.Scalable extensions ofthe ReliefF algorithm for weighting and selecting features on the multi-label learning context [J].Neurocomputing,2015,161:168-182.
[19] LI J H,FU J F,JIANG W J,et al.Feature Selection Method Based on MRMR for Text Classification[J].Computer Science,2016,43(10):225-228.(in Chinese) 李军怀,付静飞,蒋文杰,等.基于MRMR的文本分类特征选择方法[J].计算机科学,2016,43(10):225-228.
[20] SUN Y.Iterative RELIEF for feature weighting:algorithms,theories,and applications [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(6):1035-1051.
[21] GILAD-BACHRACH R,NAVOT A,TISHBY N.Margin based feature selection-theory and algorithms [C]∥Proceedings of the Twenty-first International Conference on Machine Learning.ACM,2004:43.
[22] TSOUMAKAS G,VLAHAVAS I.Random k-label sets:An ensemble method for multi-label classification [C]∥ European Conference on Machine Learning.2007:406-417.
[23] ZHANG M,PEA J,ROBLES V.Feature selection for multi-label naive Bayes classification [J].Information Sciences,2009,179(19):3218-3229.
[24] ZHANG M,ZHOU Z.ML-KNN:A lazy learning approach to multi-label learning [J].Pattern Recognition,2007,40(7):2038-2048
[25] FRIEDMAN M.A comparison of alternative tests of significance for the problem of m rankings [J].The Annals of Mathematical Statistics,1940,11(1):86-92.
[26] DUNN O.Multiple comparisons among means [J].Journal of the American Statistical Association,1961,56(293):52-64.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75, 88 .
[2] 夏庆勋,庄毅. 一种基于局部性原理的远程验证机制[J]. 计算机科学, 2018, 45(4): 148 -151, 162 .
[3] 厉柏伸,李领治,孙涌,朱艳琴. 基于伪梯度提升决策树的内网防御算法[J]. 计算机科学, 2018, 45(4): 157 -162 .
[4] 王欢,张云峰,张艳. 一种基于CFDs规则的修复序列快速判定方法[J]. 计算机科学, 2018, 45(3): 311 -316 .
[5] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[6] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[7] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[8] 刘琴. 计算机取证过程中基于约束的数据质量问题研究[J]. 计算机科学, 2018, 45(4): 169 -172 .
[9] 钟菲,杨斌. 基于主成分分析网络的车牌检测方法[J]. 计算机科学, 2018, 45(3): 268 -273 .
[10] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99, 116 .