计算机科学 ›› 2022, Vol. 49 ›› Issue (4): 134-139.doi: 10.11896/jsjkx.210300075
储安琪, 丁志军
CHU An-qi, DING Zhi-jun
摘要: 随着互联网金融行业的迅速发展,面对海量数据,传统信用风险评估面临着挑战。信用评估中样本类别不均衡,且特征冗余度高,成为影响目前评估分类精度的关键因素。为了解决以上问题,提出了一种基于灰狼优化算法同步处理样本欠采样与特征选择的方法。该方法将分类器的性能作为灰狼优化算法的启发式信息,然后进行智能搜索,以得到最优样本与特征集的组合,并在原始灰狼算法中引入禁忌表策略,避免算法陷入局部最优。实验表明,该方法相较于其他方法有较大改进,在不同数据集上的表现均证明了该方法能够有效解决样本不均衡问题,降低特征空间维度,同时提高分类准确率。其在信用风险评估上相比原始数据准确率提高了3%左右,证实了该方法在信用评估领域的适用性与优越性。
中图分类号:
[1] SUN H,WANG B.Research on Credit Risk Assessment of Online Network Credit Based on GBDT[C]//2020 International Conference on Big Data in Management.2020. [2] PENG M,ZHANG Q,XING X,et al.Trainable Undersampling for Class-Imbalance Learning[C]//AAAI Conference on Artificial Intelligence.2019. [3] WILSON D L.Asymptotic Properties of Nearest NeighborRules Using Edited Data[J].IEEE Transactions on Systems Man & Cybernetics,1972,SMC-2(3):408-421. [4] MANI I,ZHANG J.KNN Approach to Unbalanced Data Distributions:A Case Study Involving Information Extraction[C]//ICML Workshop on Learning from Imbalanced Datasets.2003. [5] LIU X Y,WU J,ZHOU Z H.Exploratory Undersampling for Class-Imbalance Learning[J].IEEE Transactions on Cyberne-tics,2009,39(2):539-550. [6] LEMAITRE G,NOGUEIRA F,ARIDAS C K.Imbalanced-learn:A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning[J].Journal of Machine Learning Research,2016,18(17):1-5. [7] LIU Y,YANG K.Credit Fraud Detection for Extremely Imba-lanced Data Based on Ensembled Deep Learning[J].Journal of Computer Research and Development,2021,58(3):539-547. [8] FRITZ S,HOSEMANN D.Restructuring the credit process:behaviour scoring for German corporates[J].Intelligent Systems in Accounting Finance & Management,2000,9(1):9-21. [9] DING C,PENG H.Minimum redundancy feature selection from microarray gene expression data[J].Journal of Bioinformatics and Computational Biology,2005,3(2):185-206. [10] HALL M.Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning[C]//Proceedings of the 17th International Conference on Machine Learning.San Francisco,Morgan Kaufmann,2000:359-366. [11] TRAN B,XUE B,ZHANG M.A New Representation in PSO for Discretization-Based Feature Selection[J].IEEE Transactions on Cybernetics,2018,48(6):1733-1746. [12] ZHANG X,LI Z S.Research on Feature Selection Algorithm Based on Natural Evolution Strategy[J].Journal of Software,2020,31(12):3733-3752. [13] MIRJALILI S,MIRJALILI S M,LEWIS A.Grey Wolf Optimizer[J].Advances in Engineering Software,2014,69:46-61. [14] ZHANG P Y,HUANG X Z,LI M Z,et al.Hybridization between Neural Computing and Nature-Inspired Algorithms for a Sentence Similarity Model Based on the Attention Mechanism[J].ACM Transactions on Asian and Low-Resource Language Information Processing,2021,20(1):1-21. [15] MISHRA S,DWIVEDULA R,KSHIRSAGAR V,et al.Robust Detection of Network Intrusion using Tree-based Convolutional Neural Networks[C]//8th ACM IKDD CODS and 26th COMAD.2021. [16] INDU S,SRIVASTAVA S,SHARMA V.Optimal CameraPlacement and Orientation of A Multi-camera System for Self Driving Cars[C]//Proceedings of the 2020 4th International Conference on Vision,Image and Signal Processing.2020:1-5. [17] LIU J,CHEN Z,ZHANG Y,et al.Path Planning of Mobile Robots based on Improved Genetic Algorithm[C]//2020 2nd International Conference on Robotics,Intelligent Control and Artificial Intelligence.2020. [18] WANG W J,SUN Y Y,SUN H L,et al.Research on Multi-source Heterogeneous Data Classification Based on Multi-objective Optimization Technology[J].Computer and Digital Engineering,2020,48(1):130-136. [19] ZHOU M.Credit Evaluation for Hybrid Grey Wolf Optimization and Least Squares Support Vector Machine Approach[J].Journal of Chengdu University of Technology(Science & Technology Edition),2019,46(4):507-512. [20] CHANG C C,LIN C J.LIBSVM:A library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology,2011,2(3):1-27. |
[1] | 李斌, 万源. 基于相似度矩阵学习和矩阵校正的无监督多视角特征选择 Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment 计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124 |
[2] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[3] | 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩. 混合改进的花授粉算法与灰狼算法用于特征选择 Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection 计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135 |
[4] | 孙林, 黄苗苗, 徐久成. 基于邻域粗糙集和Relief的弱标记特征选择方法 Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief 计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094 |
[5] | 李宗然, 陈秀宏, 陆赟, 邵政毅. 鲁棒联合稀疏不相关回归 Robust Joint Sparse Uncorrelated Regression 计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034 |
[6] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[7] | 杨蕾, 降爱莲, 强彦. 基于自编码器和流形正则的结构保持无监督特征选择 Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization 计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211 |
[8] | 侯春萍, 赵春月, 王致芃. 基于自反馈最优子类挖掘的视频异常检测算法 Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining 计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146 |
[9] | 胡艳梅, 杨波, 多滨. 基于网络结构的正则化逻辑回归 Logistic Regression with Regularization Based on Network Structure 计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106 |
[10] | 周钢, 郭福亮. 基于特征选择的高维数据集成学习方法研究 Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data 计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102 |
[11] | 丁思凡, 王锋, 魏巍. 一种基于标签相关度的Relief特征选择算法 Relief Feature Selection Algorithm Based on Label Correlation 计算机科学, 2021, 48(4): 91-96. https://doi.org/10.11896/jsjkx.200800025 |
[12] | 滕俊元, 高猛, 郑小萌, 江云松. 噪声可容忍的软件缺陷预测特征选择方法 Noise Tolerable Feature Selection Method for Software Defect Prediction 计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168 |
[13] | 张亚钏, 李浩, 宋晨明, 卜荣景, 王海宁, 康雁. 混合人工化学反应优化和狼群算法的特征选择 Hybrid Artificial Chemical Reaction Optimization with Wolf Colony Algorithm for Feature Selection 计算机科学, 2021, 48(11A): 93-101. https://doi.org/10.11896/jsjkx.210100067 |
[14] | 董明刚, 黄宇扬, 敬超. 基于遗传实例和特征选择的K近邻训练集优化方法 K-Nearest Neighbor Classification Training Set Optimization Method Based on Genetic Instance and Feature Selection 计算机科学, 2020, 47(8): 178-184. https://doi.org/10.11896/jsjkx.190700089 |
[15] | 张严, 秦亮曦. 基于Levy飞行策略的改进樽海鞘群算法 Improved Salp Swarm Algorithm Based on Levy Flight Strategy 计算机科学, 2020, 47(7): 154-160. https://doi.org/10.11896/jsjkx.190600068 |
|