Computer Science ›› 2020, Vol. 47 ›› Issue (2): 44-50.doi: 10.11896/jsjkx.181202285

• Database & Big Data & Data Science • Previous Articles     Next Articles

Feature Selection Method Based on Rough Sets and Improved Whale Optimization Algorithm

WANG Sheng-wu,CHEN Hong-mei   

  1. (School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China)1;
    (Key Laboratory of Cloud Computing and Intelligent Technology,Southwest Jiaotong University,Chengdu 611756,China)2
  • Received:2018-12-10 Online:2020-02-15 Published:2020-03-18
  • About author:WANG Sheng-wu,born in 1995,postgraduate,is member of China Computer Federation (CCF).His main research interests include cloud computing and intelligent technology;CHEN Hong-mei,born in 1971,Ph.D,professor,Ph.D supervisor,is member of China Computer Federation (CCF).Her main research interests include granular calculation,rough sets and intelligent information processing.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61572406).

Abstract: With the development of the Internet and Internet of Things technologies,data collection has become easier.However,it is necessary to reduce the dimensionality of high-dimensional data.High-dimensional data contain many redundant and unrelatedfeatures,which will increase the computational complexity of the model and even reduce the performance of the model.Feature selection can reduce the computational cost and remove redundant features by reducing feature dimensions to improve the performance of a machine learning model,and retain the original features of the data,with good interpretability.It has become one of important data preprocessing steps in machine learning.Rough set theory is an effective method which can be used to feature selection.It preserves the characteristics of the original features by removing redundant information.However,it is difficult to find the global optimal feature subset by using the traditional rough sets-based feature selection method because the cost of computing all feature subset combinations is very high.In order to overcome above problems,a feature selection method based on rough sets and improved whale optimization algorithm was proposed.An improved whale optimization algorithm was proposed by employing poli-tics of population optimization and disturbance so as to avoid local optimization.The algorithm first randomly initializes a series of feature subsets,and then uses the objective function based on the rough sets attribute dependency to evaluate the goodness of each subset.Finally,the improved whale optimization algorithm is used to find an acceptable approximate optimal feature subset by iterations.The experimental results on the UCI dataset show that the proposed algorithm can find a subset of features with less information loss and has higher classification accuracy when the support vector machine is used as the classifier for evaluation.Therefore,the proposed algorithm has a certain advantage in feature selection.

Key words: Attribute dependency, Feature selection, Improved whale optimization algorithm, Optimal feature subset, Rough set theory

CLC Number: 

  • TP301.6
[1]ZHANG D,CHEN S,ZHOU Z.Constraint Score:A new filter method for feature selection with pairwise constraints[J].Pattern Recognition,2008,41(5):1440-1451.
[2]SOLORIO-FERNANDEZ S,MARTINEZ-TRINIDAD J F, CARRASCO-OCHOA J A.A new unsupervised spectral feature selection method for mixed data:A Filter Approach[J].Pattern Recognition,2017,72:314-326.
[3]LI J D,LIU H.Challenges of feature selection for big data analytics[J].IEEE Intelligent Systems,2017,32(2):9-15.
[4]MIAO J Y,NIU L F.A survey on feature selection[J].Procedia Computer Science,2016,91:919-926.
[5]CHANDRASHEKAR G,SAHIN F.A survey on feature selection methods[J].Computers and Electrical Engineering,2014,40(1):16-28.
[6]LI M,KAMILI M.Research on feature selection methods and algorithms[J].Computer Technology and Development,2013(12):16-21.
[7]LEAS S,CANUTO AM D P.Filter-based optimization tech-niques for selection of feature subsets in ensemble systems[J].Expert Systems with Applications,2014,41(4):1622-1631.
[8]YANG P,LIU W,ZHOU B B,et al.Ensemble-based wrapper methods for feature selection and class imbalance learning[C]∥Advances in Knowledge Discovery and Data Mining.2013,7818:544-555.
[9]HAMED T,DARA R,KREMER S C.An Accurate,fast embedded feature selection for SVMs[C]∥Proceedings of the 2015 International Conference on Machine Learning and Applications.Piscataway,NJ:IEEE,2015:135-140.
[10]PAWLAK Z.Rough sets[J].International Journal of Computer and Information Science,1982,11(5):341-356.
[11]YU Y,PEDRYCZ W,Miao D.Neighborhood rough sets based multi-label classification for automatic image annotation[C]∥Proceedings of the 2013 Ifsa World Congress and Nafips Meeting.Piscataway,NJ:IEEE,2013:1373-1387.
[12]WANG C,SHAO M,He Q,et al.Feature subset selection based on fuzzy neighborhood rough sets[J].Knowledge-Based Systems,2016,111:173-179.
[13]ZHOU J,PEDRYCZ W,Miao D.Shadowed sets in the characterization of rough-fuzzy clustering[J].Pattern Recognition,2011,44(8):1738-1749.
[14]BANERJEE A,MAJI P.Rough sets and stomped normal distribution for simultaneous segmentation and bias field correction in brain MR images[J].IEEE Transactions on Image Process,2015,24(12):5764-5776.
[15]ALBANESE A,PAL S K,PETROSINO A.Rough sets,kernel set,and spatiotemporal outlier detection[J].IEEE Transactions on Knowledge & Data Engineering,2013,26(1):194-207.
[16]ZHOU B,CHEN L,JIA X.Information retrieval using rough set approximations[M]∥ICTs and the Millennium Development Goals.Springer US,2014:185-197.
[17]HU Q H,ZHAO H,YU R D.Efficient symbolic and numerical attribute reduction with neighborhood rough sets[J].Pattern Recognition and Artificial Intelligence,2008,21(6):730-738.
[18]SKOWRON A,RAUSZER C.The discernibility matrices and functions in information systems[C]∥Proceedings of the 1991 Intelligent Decision Support-handbook of Applications and Advances of the Rough Sets theory.Dordrecht:Kluwer Academic Publisher,1991:331-362.
[19]VIEGAS F,ROCHA L,GONÇALVES M,et al.A Genetic Programming approach for feature selection in highly dimensional skewed data[J].Neurocomputing,2018,273:554-569.
[20]OH I S,LEE J S,MOON B R.Hybrid genetic algorithms for feature selection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(11):1424-1437.
[21]MOAYEDIKIA A,ONG K,BOO Y L,et al.Feature selection for high dimensional imbalanced class data using harmony search[J].Engineering Applications of Artificial Intelligence,2017,57:38-49.
[22]MITIC M,VUKOVIC N,PETROVIC M,et al.Chaotic fruit fly optimization algorithm[J].Knowledge-Based Systems,2015,89(C):446-458.
[23]CHEN Y M,MIAO D Q,WANG R Z.A rough set approach to feature selection based on ant colony optimization[J].Pattern Recognition Letters,2010,31(3):226-233.
[24]XUE B,ZHANG M,BROWNE W N.Particle swarm optimization for feature selection in classification:a multi-objective approach[J].IEEE Transactions on Cybernetics,2013,43(6):1656-1671.
[25]WANG X,YANG J,TENG X,et al.Feature selection based on rough sets and particle swarm optimization[J].Pattern Recognition Letters,2007,28(4):459-471.
[26]WANG L,QIU T R,HE N,et al.A method for feature selection based on rough sets and ant colonyoptimization algorithm[J].Journal of Nanjing University(Natural Sciences),2010,46(5):487-493.
[27]CHEN Y,ZHU Q,XU H.Finding rough set reducts with fish swarm algorithm[J].Knowledge-Based Systems,2015,81(C):22-29.
[28]MIRJALILI S,LEWIS A.The Whale optimization algorithm.[J].Advances in Engineering Software,2016,95:51-67.
[29]WAIKATO M L G.Weka 3:Data Mining Software in Java [EB/OL].[2018-07-10].http://www.cs.waikato.ac.nz/ml/weka/.
[1] LI Bin, WAN Yuan. Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment [J]. Computer Science, 2022, 49(8): 86-96.
[2] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[3] KANG Yan, WANG Hai-ning, TAO Liu, YANG Hai-xiao, YANG Xue-kun, WANG Fei, LI Hao. Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection [J]. Computer Science, 2022, 49(6A): 125-132.
[4] CHU An-qi, DING Zhi-jun. Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation [J]. Computer Science, 2022, 49(4): 134-139.
[5] SUN Lin, HUANG Miao-miao, XU Jiu-cheng. Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief [J]. Computer Science, 2022, 49(4): 152-160.
[6] LI Zong-ran, CHEN XIU-Hong, LU Yun, SHAO Zheng-yi. Robust Joint Sparse Uncorrelated Regression [J]. Computer Science, 2022, 49(2): 191-197.
[7] ZHANG Ye, LI Zhi-hua, WANG Chang-jie. Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method [J]. Computer Science, 2021, 48(9): 337-344.
[8] YANG Lei, JIANG Ai-lian, QIANG Yan. Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization [J]. Computer Science, 2021, 48(8): 53-59.
[9] HOU Chun-ping, ZHAO Chun-yue, WANG Zhi-peng. Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining [J]. Computer Science, 2021, 48(7): 199-205.
[10] HU Yan-mei, YANG Bo, DUO Bin. Logistic Regression with Regularization Based on Network Structure [J]. Computer Science, 2021, 48(7): 281-291.
[11] ZHOU Gang, GUO Fu-liang. Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data [J]. Computer Science, 2021, 48(6A): 250-254.
[12] DING Si-fan, WANG Feng, WEI Wei. Relief Feature Selection Algorithm Based on Label Correlation [J]. Computer Science, 2021, 48(4): 91-96.
[13] TENG Jun-yuan, GAO Meng, ZHENG Xiao-meng, JIANG Yun-song. Noise Tolerable Feature Selection Method for Software Defect Prediction [J]. Computer Science, 2021, 48(12): 131-139.
[14] ZHANG Ya-chuan, LI Hao, SONG Chen-ming, BU Rong-jing, WANG Hai-ning, KANG Yan. Hybrid Artificial Chemical Reaction Optimization with Wolf Colony Algorithm for Feature Selection [J]. Computer Science, 2021, 48(11A): 93-101.
[15] DONG Ming-gang, HUANG Yu-yang, JING Chao. K-Nearest Neighbor Classification Training Set Optimization Method Based on Genetic Instance and Feature Selection [J]. Computer Science, 2020, 47(8): 178-184.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!