Computer Science ›› 2020, Vol. 47 ›› Issue (8): 178-184.doi: 10.11896/jsjkx.190700089

Previous Articles     Next Articles

K-Nearest Neighbor Classification Training Set Optimization Method Based on Genetic Instance and Feature Selection

DONG Ming-gang1, 2, HUANG Yu-yang1, JING Chao1, 2,   

  1. 1 College of Information Science and Engineering, Guilin University of Technology, Guilin, Guangxi 541004, China
    2 Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin, Guangxi 541004, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:DONG Ming-gang, born in 1977, Ph.D, professor, is a member of China Computer Federation.His main research interests include intelligent computing and its applications, machine learning.
    JING Chao, born in 1983, Ph.D, associate professor, is a member of China Computer Federation.His main research interests are intelligent computing, deep reinforcement learning and optimization approaches.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61563012, 61802085, 61203109), Guangxi Natural Science Foundation (2014GXNSFAA118371, 2015GXNSFBA139260) and Guangxi Key Laboratory of Embedded Technology and Intelligent Systems (2018A-04).

Abstract: The classification performance of K-Nearest Neighbor depends on the quality of training set.It is significant to design an efficient training set optimization algorithm.Two major drawbacks of traditional evolutionary training set optimization algorithm are low efficiency and removing the non-noise samples and features by mistake.To address these issues, this paper proposes a genetic training set optimization algorithm.The algorithm uses the efficient genetic algorithm based on the maximum Hamming distance.Each cross preserves the parent and generates two new children with the largest Hamming distance, which not only improves the efficiency but also ensures the population diversity.In the proposed algorithm, the local noise sample deletion strategy is combined with the feature selection strategy.Firstly, the decision tree is used to determine the range of noise samples.Then the genetic algorithm is used to remove the noise samples in this range and select the features simultaneously.It reduces the risk of mistaken and improves the efficiency.At last, the 1NN-based selection strategy of validation set is used to improve the instance and feature selection accuracy of the genetic algorithm.Compared with co-evolutionary instance feature selection algorithm ( IFS-CoCo), weighted co-evolutionary instance feature selection algorithm (CIW-NN), evolutionary feature selection algorithm (EIS-RFS, evolutionary instance selection algorithm (PS-NN) and traditional KNN, the average improvement of the proposed algorithm in classification accuracy is 2.18% , 2.06%, 5.61%, 4.06%, 4.00%, respectively.The experiments results suggest that the proposed method has higher classification accuracy and optimization efficiency.

Key words: Decision tree, Feature selection, Genetic algorithm, Instance selection, K-nearest neighbor, Noise sample

CLC Number: 

  • TP181
[1]COVER T, HART P.Nearest neighbor pattern classification[J].IEEE Transactions on Information Theory, 1967, 13(1):21-27.
[2]ZHAN Y, DAI S, MAO Q, et al.A Video Semantic Analysis Method Based on Kernel Discriminative Sparse Representation and Weighted KNN[J].Computer Journal, 2018, 58(6):1360-1372.
[3]WANG Y, YANG Y W.KNN Similarity Graph AlgorithmBased on Heap and Neighborhood Coexistence[J].Computer Science, 2018, 45(5):196-200, 227.
[4]FENG G L, ZHOU W G.Spark-based Parallel Outlier Detection Algorithm of K-nearest Neighbor[J].Computer Science, 2018, 45(S2):349-352, 366.
[5]DENG Z, ZHU X, CHENG D, et al.Efficient kNN classification algorithm for big data[J].Neurocomputing, 2016, 195(C):143-148.
[6]ZHANG S, LI X, MING Z, et al.Efficient kNN Classification With Different Numbers of Nearest Neighbors[J].IEEE Tran-sactions on Neural Networks & Learning Systems, 2017, PP(99):1-12.
[7]ZHANG S, LI X, ZONG M, et al.Learning k, for kNN Classification[J].Acm Transactions on Intelligent Systems & Techno-logy, 2017, 8(3):43.
[8]GILPITA R, YAO X.Evolving edited k-nearest neighbor classifiers[J].International Journal of Neural Systems, 2009, 18(6):459-467.
[9]XIE H, LIANG D, ZHANG Z, et al.A Novel Pre-Classification Based kNN Algorithm[C]∥IEEE International Conference on Data Mining Workshops.New Orleans, America, 2017:1269-1275.
[10]LIU H, MOTODA H.Instance Selection and Construction forData Mining[M].Springer International, 2001:448-454.
[11]WETTSCHERECK D, AHA D W, MOHRI T.A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms[J].Artificial Intelligence Review, 1997, 11(1/2/3/4/5):273-314.
[12]EIBEN A E, SCHOENAUER M.Evolutionary Computing[J].Soft Computing, 1998, 82(1):1-6.
[13]ACAMPORA G, TORTORA G, VITIELLO A.Applying SPEA2to prototype selection for nearest neighbor classification[C]∥IEEE International Conference on Systems, Man, andCyberne-tics.Montreal, Canada, 2017:003924-003929.
[14]SASIKALA S, APPAVU A B S, GEETHA S.A novel adaptive feature selector for supervised classification[J].Information Processing Letters, 2017, 117:25-34.
[15]KHIABANI A, SABBAGHI A.PHGA:Proposed hybrid genetic algorithm for feature selection in binary classification[C]∥International Conference on Information and Knowledge Technology.Tehran, Iran, 2017:147-154.
[16]DERRAC J, GARCA S, HERRERA F.Ifs-CoCo:Instance and Feature Selection Based on Cooperative Coevolution With Nearest Neighbor Rule[J].Pattern Recognition, 2010, 43(6):2082-2105.
[17]DERRAC J, TRIGUERO I, GARCIA S, et al.Integrating in-stance selection, instance weighting, and feature weighting for nearest neighbor classifiers by coevolutionary algorithms[J].IEEE Transactions on Systems Man & Cybernetics Part B, 2012, 42(5):1383-1397.
[18]DERRAC J, CORNELIS C, GARCA S, et al.Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection[J].Information Sciences, 2012, 186(1):73-92.
[19]DERRAC J, VERBIEST N, GARCA S, et al.On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection[J].Soft Computing, 2013, 17(2):223-238.
[20]BRAHIM A B, LIMAM M.A hybrid feature selection method based on instance learning and cooperative subset search ☆[J].Pattern Recognition Letters, 2016, 69:28-34.
[21]ESHELMAN L J.The CHC Adaptive Search Algorithm:How to Have Safe Search When Engaging in Nontraditional Genetic Recombination[J].Foundations of Genetic Algorithms, 1991, 1:265-283.
[22] GOLDBERG D E, SASTRY K.A Practical Schema Theorem for Genetic Algorithm Design and Tuning[C]∥Genetic and Evolutionary Computation Conference.San Francisco, America, 2001:328-335.
[23]HUANG Y Y, DONG M G, JING C.Genetic instance selection algorithm for k-nearest neighbor classifier[J].Journal of Computer Applications, 2018, 38(11):3112-3118.
[1] LI Bin, WAN Yuan. Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment [J]. Computer Science, 2022, 49(8): 86-96.
[2] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[3] KANG Yan, WANG Hai-ning, TAO Liu, YANG Hai-xiao, YANG Xue-kun, WANG Fei, LI Hao. Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection [J]. Computer Science, 2022, 49(6A): 125-132.
[4] YANG Hao-xiong, GAO Jing, SHAO En-lu. Vehicle Routing Problem with Time Window of Takeaway Food ConsideringOne-order-multi-product Order Delivery [J]. Computer Science, 2022, 49(6A): 191-198.
[5] CHU An-qi, DING Zhi-jun. Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation [J]. Computer Science, 2022, 49(4): 134-139.
[6] SUN Lin, HUANG Miao-miao, XU Jiu-cheng. Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief [J]. Computer Science, 2022, 49(4): 152-160.
[7] LI Zong-ran, CHEN XIU-Hong, LU Yun, SHAO Zheng-yi. Robust Joint Sparse Uncorrelated Regression [J]. Computer Science, 2022, 49(2): 191-197.
[8] SHEN Biao, SHEN Li-wei, LI Yi. Dynamic Task Scheduling Method for Space Crowdsourcing [J]. Computer Science, 2022, 49(2): 231-240.
[9] REN Shou-peng, LI Jin, WANG Jing-ru, YUE Kun. Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction [J]. Computer Science, 2022, 49(2): 265-271.
[10] LIU Zhen-yu, SONG Xiao-ying. Multivariate Regression Forest for Categorical Attribute Data [J]. Computer Science, 2022, 49(1): 108-114.
[11] ZHANG Ye, LI Zhi-hua, WANG Chang-jie. Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method [J]. Computer Science, 2021, 48(9): 337-344.
[12] YANG Lei, JIANG Ai-lian, QIANG Yan. Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization [J]. Computer Science, 2021, 48(8): 53-59.
[13] HOU Chun-ping, ZHAO Chun-yue, WANG Zhi-peng. Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining [J]. Computer Science, 2021, 48(7): 199-205.
[14] HU Yan-mei, YANG Bo, DUO Bin. Logistic Regression with Regularization Based on Network Structure [J]. Computer Science, 2021, 48(7): 281-291.
[15] WU Shan-jie, WANG Xin. Prediction of Tectonic Coal Thickness Based on AGA-DBSCAN Optimized RBF Neural Networks [J]. Computer Science, 2021, 48(7): 308-315.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!