Computer Science ›› 2018, Vol. 45 ›› Issue (6): 228-234.doi: 10.11896/j.issn.1002-137X.2018.06.041

• Artificial Intelligence • Previous Articles     Next Articles

Multi-label Feature Selection Algorithm Based on Improved Label Correlation

CHEN Fu-cai, LI Si-hao, ZHANG Jian-peng, HUANG Rui-yang   

  1. National Digital Switching System Engineering and Technological R&D Center,Zhengzhou 450002,China
  • Received:2017-04-25 Online:2018-06-15 Published:2018-07-24

Abstract: Multi-label feature selection is one of the essential methods to overcome the curse of dimensionality.It reduces the feature dimension,improves the learning efficiency,and optimizes the classification performance.However,many existing feature selection algorithms hardly take label correlation into consideration,and the range of information entropies are biased within different data sets.To address those problems,this paper proposed a multi-label feature selection algorithm based on the improved label correlation.The algorithm firstly uses symmetrical uncertainty to norma-lize the information entropy,and takes normalized mutual information as relationship measurement to define the label importance,with which the label-related items in dependency and redundancy are weighted.In the end,the score function is put forward to evaluate the feature importance,and the best feature subset is selected with the highest score.Experiments demonstrate that after selecting out the concise and accurate feature subset,the multi-label classification is accelerated in terms of the performance and the efficiency with disperse features.

Key words: Dependency, Feature score, Label correlation, Multi-label feature selection, Redundancy

CLC Number: 

  • TP391
[1]WU X,ZHU X,WU G Q,et al.Data mining with big data[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(1):97-107.
[2]ZHANG J J,FANG M,LI X.Multi-label learning with discriminative features for each label[J].Neurocomputing,2015,154:305-316.
[3]JIANG S,WANG L.Efficient feature selection based on correlation measure between continuous and discrete features[J].Information Processing Letters,2016,116(2):203-215.
[4]ZHANG Y X,SUN Y,YANG J H,et al.Feature importance analysis for spammer detection in SinaWeibo[J].Journal on Communications,2016,37(8):24-33.(in Chinese)
张宇翔,孙菀,杨家海,等.新浪微博反垃圾中特征选择的重要性分析[J].通信学报,2016,37(8):24-33.
[5]XIE J Y,XIE W X.Several Feature Selection Algorithms Based on the Discernibility of a Feature Subset and Support Vector Machines[J].Chinese Journal of Computers,2014,37(8):1704-1718.(in Chinese)
谢娟英,谢维信.基于特征子集区分度与支持向量机的特征选择算法[J].计算机学报,2014,37(8):1704-1718.
[6]LIU H,LI X,ZHANG S.Learning instance correlation functions for multilabel classification[J].IEEE Transactions on Cyberne-tics,2017,47(2):499-510.
[7]TANG J L,ALELYANI S,LIU H.Feature selection for classification:A review[M]//Data Classification:Algorithms and Applications.CRC Press,Chapman,2014:313-334.
[8]SILVA A M D,LEONG P H W.Grammar-based feature generation for time-series prediction[M].Singapore:Springer Singapore,2015:13-23.
[9]PENG H,LONG F,DING C.Feature selection based on mutual information criteria of max-dependency,max-relevance,and min-redundancy[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(8):1226-1238.
[10]SHAO H,LI G Z,LIU G P,et al.Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine[J].Science China Information Sciences,2013,56(5):1-13.
[11]YOU M,LIU J,LI G Z,et al.Embedded feature selection for multi-label classification of music emotions[J].International Journal of Computational Intelligence Systems,2012,5(4):668-678.
[12]DOQUIRE G,VERLEYSEN M.Mutual information-based feature selection for multi-label classification[J].Neurocomputing,2013,122:148-155.
[13]ZHANG Z H,LI S N,LI Z G,et al.Multi-Label Feature Selection Algorithm Based on Information Entropy[J].Journal of Computer Research and Development,2013,50(6):1177-1184.(in Chinese)
张振海,李士宁,李志刚,等.一类基于信息熵的多标签特征选择算法[J].计算机研究与发展,2013,50(6):1177-1184.
[14]MANDAL M,MUKHOPADHYAY A.An improved minimum redundancy maximum relevance approach for feature selection in gene expression data[J].Procedia Technology,2013,10(1):20-27.
[15]LIN Y,HU Q,LIU J,et al.Multi-label feature selection based on max-dependency and min-redundancy[J].Neurocomputing,2015,168(C):92-103.
[16]WITTEN I H,FRANK E,HALL M A,et al..Data mining:Practical machine learning tools and techniques[M].Burlington:Morgan Kaufmann,2016:143-186.
[17]ZHANG M L,ZHOU Z H.ML-KNN:A lazy learning approach to multi-label learning[J].Pattern Recognition,2007,40(7):2038-2048.
[18]TSOUMAKAS G,KATAKIS I,VLAHAVAS I.Random k-labelsets for multilabelclassification[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(7):1079-1089.
[19]READ J,PFAHRINGER B,HOLMES G,et al.Classifier chains for multi-label classification[J].Machine Learning,2009,85(3):254-269.
[20]TSOUMAKAS G,KATAKIS I,VLAHAVAS I.Effective and efficient multilabel classification in domains with large number of labels[C]//Proccessing of ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD’08).Antwerp,Belgium,2008:30-44.
[1] ZHONG Gui-feng, PANG Xiong-wen, SUI Dong. Text Classification Method Based on Word2Vec and AlexNet-2 with Improved AttentionMechanism [J]. Computer Science, 2022, 49(4): 288-293.
[2] LIN Li-xiang, LIU Xu-dong, LIU Shao-teng, XU Yue-dong. Survey on the Application of Forward Error Correction Coding in Network Transmission Protocols [J]. Computer Science, 2022, 49(2): 292-303.
[3] XIAN Yan-tuan, GAO Fan-ya, XIANG Yan, YU Zheng-tao, WANG Jian. Improving Low-resource Dependency Parsing Using Multi-strategy Data Augmentation [J]. Computer Science, 2022, 49(1): 73-79.
[4] XIE Chen-qi, ZHANG Bao-wen, YI Ping. Survey on Artificial Intelligence Model Watermarking [J]. Computer Science, 2021, 48(7): 9-16.
[5] HONG Chang-jian, GAO Yang, ZHANG Fan, ZHANG Lei. Reliable Transmission Strategy for Underwater Wireless Sensor Networks [J]. Computer Science, 2021, 48(6A): 410-413.
[6] DING Si-fan, WANG Feng, WEI Wei. Relief Feature Selection Algorithm Based on Label Correlation [J]. Computer Science, 2021, 48(4): 91-96.
[7] WANG Shi-hao, WANG Zhong-qing, LI Shou-shan, ZHOU Guo-dong. Event Argument Extraction Using Gated Graph Convolution and Dynamic Dependency Pooling [J]. Computer Science, 2021, 48(11A): 52-56.
[8] CHEN Jie-ting, WANG Wei-ying, JIN Qin. Multi-label Video Classification Assisted by Danmaku [J]. Computer Science, 2021, 48(1): 167-174.
[9] HAN Lei, HU Jian-peng. Deduplication Algorithm of Abstract Syntax Tree in GCC Based on Trie Tree of Keywords [J]. Computer Science, 2020, 47(9): 47-51.
[10] ZHAO Wei, LIN Yu-ming, WANG Chao-qiang, CAI Guo-yong. Opinion Word-pairs Collaborative Extraction Based on Dependency Relation Analysis [J]. Computer Science, 2020, 47(8): 164-170.
[11] LIU Xiao-ling,LIU Bai-song,WANG Yang-yang,TANG Hao. Research and Development of Multi-label Generation Based on Deep Learning [J]. Computer Science, 2020, 47(3): 192-199.
[12] WANG Sheng-wu,CHEN Hong-mei. Feature Selection Method Based on Rough Sets and Improved Whale Optimization Algorithm [J]. Computer Science, 2020, 47(2): 44-50.
[13] WANG Rui-jie, LI Jun-huai, WANG Kan, WANG Huai-jun, SHANG Xun-chao, TU Peng-jia. Feature Selection Method for Behavior Recognition Based on Improved Feature Subset Discrimination [J]. Computer Science, 2020, 47(11A): 204-208.
[14] ZHOU Xin-yu, LI Pei-feng. Event Temporal Relation Classification Method Based on Information Interaction Enhancement [J]. Computer Science, 2020, 47(11): 244-249.
[15] FANG Bo,CHEN Hong-mei,WANG Sheng-wu. Feature Selection Algorithm Based on Rough Sets and Fruit Fly Optimization [J]. Computer Science, 2019, 46(7): 157-164.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!