计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 178-184.doi: 10.11896/jsjkx.190700089
董明刚1, 2, 黄宇扬1, 敬超1, 2
DONG Ming-gang1, 2, HUANG Yu-yang1, JING Chao1, 2,
摘要: K近邻的分类性能依赖于训练集的质量。设计高效的训练集优化算法具有重要意义。针对传统的进化训练集优化算法效率较低、误删率较高的不足, 提出了一种遗传训练集优化算法。该算法采用基于最大汉明距离的高效遗传算法, 每次交叉保留父代并生成两个新的具有最大汉明距离的子代, 既提高了效率, 又保证了种群多样性。该算法将局部的噪声样本删除策略与特征选择策略相结合。首先使用决策树算法确定噪声样本存在的范围, 然后使用遗传算法精准删除此范围内的噪声样本和全局的噪声特征, 降低了误删率, 提高了效率。该算法采用基于最近邻规则的验证集选择策略, 进一步提高了遗传算法实例选择和特征选择的准确度。在15个标准数据集上, 该方法相较于协同进化实例特征选择算法IFS-CoCo、加权协同进化实例特征选择算法CIW-NN、进化特征选择算法EIS-RFS、进化实例选择算法PS-NN、K近邻算法KNN, 在分类精度上分别平均提升了2.18%, 2.06%, 5.61%, 4.06%和4.00%。实验结果表明, 所提方法的分类精度和优化效率优于当前的进化训练集优化算法。
中图分类号:
[1]COVER T, HART P.Nearest neighbor pattern classification[J].IEEE Transactions on Information Theory, 1967, 13(1):21-27. [2]ZHAN Y, DAI S, MAO Q, et al.A Video Semantic Analysis Method Based on Kernel Discriminative Sparse Representation and Weighted KNN[J].Computer Journal, 2018, 58(6):1360-1372. [3]WANG Y, YANG Y W.KNN Similarity Graph AlgorithmBased on Heap and Neighborhood Coexistence[J].Computer Science, 2018, 45(5):196-200, 227. [4]FENG G L, ZHOU W G.Spark-based Parallel Outlier Detection Algorithm of K-nearest Neighbor[J].Computer Science, 2018, 45(S2):349-352, 366. [5]DENG Z, ZHU X, CHENG D, et al.Efficient kNN classification algorithm for big data[J].Neurocomputing, 2016, 195(C):143-148. [6]ZHANG S, LI X, MING Z, et al.Efficient kNN Classification With Different Numbers of Nearest Neighbors[J].IEEE Tran-sactions on Neural Networks & Learning Systems, 2017, PP(99):1-12. [7]ZHANG S, LI X, ZONG M, et al.Learning k, for kNN Classification[J].Acm Transactions on Intelligent Systems & Techno-logy, 2017, 8(3):43. [8]GILPITA R, YAO X.Evolving edited k-nearest neighbor classifiers[J].International Journal of Neural Systems, 2009, 18(6):459-467. [9]XIE H, LIANG D, ZHANG Z, et al.A Novel Pre-Classification Based kNN Algorithm[C]∥IEEE International Conference on Data Mining Workshops.New Orleans, America, 2017:1269-1275. [10]LIU H, MOTODA H.Instance Selection and Construction forData Mining[M].Springer International, 2001:448-454. [11]WETTSCHERECK D, AHA D W, MOHRI T.A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms[J].Artificial Intelligence Review, 1997, 11(1/2/3/4/5):273-314. [12]EIBEN A E, SCHOENAUER M.Evolutionary Computing[J].Soft Computing, 1998, 82(1):1-6. [13]ACAMPORA G, TORTORA G, VITIELLO A.Applying SPEA2to prototype selection for nearest neighbor classification[C]∥IEEE International Conference on Systems, Man, andCyberne-tics.Montreal, Canada, 2017:003924-003929. [14]SASIKALA S, APPAVU A B S, GEETHA S.A novel adaptive feature selector for supervised classification[J].Information Processing Letters, 2017, 117:25-34. [15]KHIABANI A, SABBAGHI A.PHGA:Proposed hybrid genetic algorithm for feature selection in binary classification[C]∥International Conference on Information and Knowledge Technology.Tehran, Iran, 2017:147-154. [16]DERRAC J, GARCA S, HERRERA F.Ifs-CoCo:Instance and Feature Selection Based on Cooperative Coevolution With Nearest Neighbor Rule[J].Pattern Recognition, 2010, 43(6):2082-2105. [17]DERRAC J, TRIGUERO I, GARCIA S, et al.Integrating in-stance selection, instance weighting, and feature weighting for nearest neighbor classifiers by coevolutionary algorithms[J].IEEE Transactions on Systems Man & Cybernetics Part B, 2012, 42(5):1383-1397. [18]DERRAC J, CORNELIS C, GARCA S, et al.Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection[J].Information Sciences, 2012, 186(1):73-92. [19]DERRAC J, VERBIEST N, GARCA S, et al.On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection[J].Soft Computing, 2013, 17(2):223-238. [20]BRAHIM A B, LIMAM M.A hybrid feature selection method based on instance learning and cooperative subset search ☆[J].Pattern Recognition Letters, 2016, 69:28-34. [21]ESHELMAN L J.The CHC Adaptive Search Algorithm:How to Have Safe Search When Engaging in Nontraditional Genetic Recombination[J].Foundations of Genetic Algorithms, 1991, 1:265-283. [22] GOLDBERG D E, SASTRY K.A Practical Schema Theorem for Genetic Algorithm Design and Tuning[C]∥Genetic and Evolutionary Computation Conference.San Francisco, America, 2001:328-335. [23]HUANG Y Y, DONG M G, JING C.Genetic instance selection algorithm for k-nearest neighbor classifier[J].Journal of Computer Applications, 2018, 38(11):3112-3118. |
[1] | 李斌, 万源. 基于相似度矩阵学习和矩阵校正的无监督多视角特征选择 Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment 计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124 |
[2] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[3] | 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩. 混合改进的花授粉算法与灰狼算法用于特征选择 Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection 计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135 |
[4] | 杨浩雄, 高晶, 邵恩露. 考虑一单多品的外卖订单配送时间的带时间窗的车辆路径问题 Vehicle Routing Problem with Time Window of Takeaway Food ConsideringOne-order-multi-product Order Delivery 计算机科学, 2022, 49(6A): 191-198. https://doi.org/10.11896/jsjkx.210400005 |
[5] | 储安琪, 丁志军. 基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理 Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation 计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075 |
[6] | 孙林, 黄苗苗, 徐久成. 基于邻域粗糙集和Relief的弱标记特征选择方法 Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief 计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094 |
[7] | 李宗然, 陈秀宏, 陆赟, 邵政毅. 鲁棒联合稀疏不相关回归 Robust Joint Sparse Uncorrelated Regression 计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034 |
[8] | 沈彪, 沈立炜, 李弋. 空间众包任务的路径动态调度方法 Dynamic Task Scheduling Method for Space Crowdsourcing 计算机科学, 2022, 49(2): 231-240. https://doi.org/10.11896/jsjkx.210400249 |
[9] | 任首朋, 李劲, 王静茹, 岳昆. 基于集成回归决策树的lncRNA-疾病关联预测方法 Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction 计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132 |
[10] | 刘振宇, 宋晓莹. 一种可用于分类型属性数据的多变量回归森林 Multivariate Regression Forest for Categorical Attribute Data 计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189 |
[11] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[12] | 杨蕾, 降爱莲, 强彦. 基于自编码器和流形正则的结构保持无监督特征选择 Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization 计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211 |
[13] | 侯春萍, 赵春月, 王致芃. 基于自反馈最优子类挖掘的视频异常检测算法 Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining 计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146 |
[14] | 胡艳梅, 杨波, 多滨. 基于网络结构的正则化逻辑回归 Logistic Regression with Regularization Based on Network Structure 计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106 |
[15] | 吴善杰, 王新. 基于AGA-DBSCAN优化的RBF神经网络构造煤厚度预测方法 Prediction of Tectonic Coal Thickness Based on AGA-DBSCAN Optimized RBF Neural Networks 计算机科学, 2021, 48(7): 308-315. https://doi.org/10.11896/jsjkx.200800110 |
|