Computer Science ›› 2023, Vol. 50 ›› Issue (1): 59-68.doi: 10.11896/jsjkx.220800191

• Database & Big Data & Data Science • Previous Articles     Next Articles

Credit Evaluation Model Based on Dynamic Machine Learning

CHEN Yijun, GAO Haoran, DING Zhijun   

  1. Key Laboratory of Embedded System and Service Computing of Ministry of Education,Tongji University,Shanghai 201804,China
    Shanghai Network Finance Security Collaborative Innovation Center,Tongji University,Shanghai 201804,China
  • Received:2022-08-19 Revised:2022-09-21 Online:2023-01-15 Published:2023-01-09
  • About author:CHEN Yijun,born in 2000,postgra-duate.Her main research interests include data mining and machine lear-ning.
    DING Zhijun,born in 1974,Ph.D,professor,Ph.D supervisor,is a senior member of China Computer Federation.His main research interests include intelligent software engineering,cloud computing and services,big data credit reporting and financial risk control.
  • Supported by:
    Shanghai Science and Technology Innovation Action Plan(19511101300).

Abstract: With the development of computer technology,using machine learning algorithms to build automated evaluation models has become an important tool to for the financial institutions to conduct credit evaluation.However,currently,the credit evaluation model is still facing challenges:credit data is class-imbalanced and high-dimensional,meanwhile,the behavior of customers can be influenced by the changeable external environment,namely,the concept drift will occur.As a result,this paper proposes a dynamic credit evaluation model,which can achieve the flexible model update by using ensemble learning algorithm to continuously add base classifiers trained on new incremental data,and dynamically adjusting the weight of each base classifier to adapt to concept drift.When concept drift occurs,according to the detection results of concept drift,the model is able to use different forms of equalization and feature selection on credit data.In particular,for feature selection,this paper proposes an incremental feature selection algorithm combining the choice of representative samples that makes the feature selection efficient and accurate,enabling the model to simultaneously process the high-dimensional imbalanced data and adapt the concept drift of incremental credit data.Finally,this paper manages to demonstrate that the proposed dynamic model is more efficient and accurate than other prevailing algorithms on real incremental high-dimensional credit datasets.

Key words: Credit evaluation, Feature selection, Concept drift, Sliding window, Dynamic model

CLC Number: 

  • TP3-05
[1]YUAN Y,GONG X,GUO M,et al.Research on Personal Credit Evaluation of Commercial Banks Under Ensemble Learning Framework[C]//2020 2nd International Conference on Applied Machine Learning(ICAML).IEEE,2020:29-38.
[2]LU J,LIU A,DONG F,et al.Learning Under Concept Drift:A Review[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(12):2346-2363.
[3]KRAWCZYK B.Learning from Imbalanced Data:Open Challenges and Future Directions[J].Progress in Artificial Intelligence,2016,5(4):221-232.
[4]ARYA S,ECKEL C,WICHMAN C.Anatomy of the CreditScore[J].Journal of Economic Behavior & Organization,2013,95:175-185.
[5]DONG G,LAI K K,YEN J.Credit scorecard based on logistic regression with random coefficients[J].Procedia Computer Science,2010,1(1):2463-2468.
[6]HAND D J,HENLEY W E.Statistical Classification Methods in Consumer Credit Scoring:A Review [J].Journal of the Royal Statistical Society,1997,160(3):523-541.
[7]DANENAS P,GARSVA G.Selection of Support Vector Ma-chines Based Classifiers for Credit Risk Domain [J].Expert Systems with Applications,2015,42(6):3194-3203.
[8]HARRIS T.Credit Scoring Using the Clustered Support Vector Machine [J].Expert Systems with Applications,2015,42(2):741-750.
[9]ONG C S,HUANG J J,TZENG G H.Building Credit Scoring Models Using Genetic Programming [J].Expert Systems with Applications,2005,29(1):41-47.
[10]WEST D.Neural Network Credit Scoring Models [J].Compu-ters & Operations Research,2000,27(11):1131-1152.
[11]SUN J,LANG J,FUJITA H,et al.Imbalanced Enterprise Cre-dit Evaluation with DTE-SBD:Decision Tree Ensemble Based on SMOTE and Bagging with Differentiated Sampling Rates[J/OL].Information Sciences,2018,425:76-91.https://www.sciencedirect.com/science/article/pii/S0020025517310083.
[12]ZHANG W,HE H,ZHANG S.A Novel Multi-stage Hybrid Model with Enhanced Multi-Population Niche Genetic Algorithm:An Application in Credit Scoring[J/OL].Expert Systems with Applications,2018,121:221-232.https://www.sciencedirect.com/science/article/pii/S0957417418307887.
[13]BARDDAL J P,LOEZER L,ENEMBRECK F,et al.Lessons Learned From Data Stream Classification Applied to Credit Scoring[J/OL].Expert Systems with Applications,2020,162:113899.https://www.sciencedirect.com/science/article/pii/S0167268111001259.
[14]CAI Y,JIANG Y.Credit Scoring Using Incremental LearningAlgorithm for SVDD[C]//2016 International Conference on Computer,Information and Telecommunication Systems(CITS).IEEE,2016:1-4.
[15]PONTIL M,VERRI A.Properties of Support Vector Machines[J].Neural Computation,1998,10(4):955-974.
[16]TAX D M J,DUIN R P W.Support Vector Data Description[J].Machine learning,2004,54(1):45-66.
[17]TIAN J,LIU X,LI M.An Incremental Learning EnsembleMethod for Imbalanced Credit Scoring[C]//2019 IEEE Symposium Series on Computational Intelligence(SSCI).IEEE,2019:754-759.
[18]VENKATESH B,ANURADHA J.A Review of Feature Selection and Its Methods[J].Cybernetics and Information Technologies,2019,19(1):3-26.
[19]GUYON I,ELISSEEFF A.An Introduction to Variable andFeature Selection[J].Journal of Machine Learning Research,2003,3(5):1157-1182.
[20]SHU W,QIAN W,XIE Y.Incremental Feature Selection forDynamic Hybrid Data Using Neighborhood Rough Set[J/OL].Knowledge-Based Systems,2020,194:105516.https://www.sciencedirect.com/science/article/pii/S0950705120300289.
[21]SANG B,CHEN H,YANG L,et al.Incremental Feature Selection Using a Conditional Entropy Based on Fuzzy Dominance Neighborhood Rough Sets[J].IEEE Transactions on Fuzzy Systems,2021,30(6):1683-1697.
[22]ŽLIOBAITE· I,PECHENIZKIY M,GAMA J.Big Data Analysis:New Algorithms for a New Society[M].Cham,Switzerland:Springer International Publishing,2016:91-114.
[23]ELWELL R,POLIKAR R.Incremental Learning of ConceptDrift in Nonstationary Environments[J].IEEE Transactions on Neural Networks,2011,22(10):1517-1531.
[24]ZHANG S,LIU J,ZUO X.Adaptive Online Incremental Lear-ning for Evolving Data Streams[J/OL].Applied Soft Computing,2021,105:107255.https://www.sciencedirect.com/science/article/pii/S1568494621001782.
[25]LI Z,HUANG W,XIONG Y,et al.Incremental Learning Imba-lanced Data Streams with Concept Drift:The Dynamic Updated Ensemble Algorithm[J/OL].Knowledge-Based Systems,2020,195:105694.https://www.sciencedirect.com/science/article/pii/S095070512030126X.
[26]DUBOIS D,PRADE H.Rough Fuzzy Sets and Fuzzy RoughSets[J].International Journal of General System,1990,17(2/3):191-209.
[27]ZHANG X,MEI C,CHEN D,et al.Feature Selection in Mixed Data:A Method Using a Novel Fuzzy Rough Set Based Information Entropy[J/OL].Pattern Recognition,2016,56:1-15.https://www.sciencedirect.com/science/article/pii/S0031320316000844.
[28]ZHANG X,MEI C,CHEN D,et al.Active Incremental Feature Selection Using a Fuzzy-Rough-Set-Based Information Entropy[J].IEEE Transactions on Fuzzy Systems,2019,28(5):901-915.
[29]BARANDELA R,VALDOVINOS R M,SÁNCHEZ J S.NewApplications of Ensembles of Classifiers[J].Pattern Analysis & Applications,2003,6(3):245-256.
[30]CHANG S,SHIHONG Y,QI L.Clustering Characteristics of UCI Dataset[C]//2020 39th Chinese Control Conference(CCC).IEEE,2020:6301-6306.
[31]YANG Y,CHEN D,WANG H,et al.Fuzzy Rough Set Based Incremental Attribute Reduction from Dynamic Data with Sample Arriving[J/OL].Fuzzy Sets and Systems,2017,312:66-86.https://www.sciencedirect.com/science/article/pii/S0167404820301231.
[32]LI X K,CHEN W,ZHANG Q,et al.Building Auto-Encoder Intrusion Detection System Based on Random Forest Feature Selection[J/OL].Computers & Security,2020,95:101851.https://www.sciencedirect.com/science/article/pii/S0167404820301231.
[33]GHOSH M,GUHA R,ALAM I,et al.Binary Genetic SwarmOptimization:A Combination of GA and PSO for Feature Selection[J].Journal of Intelligent Systems,2020,29(1):1598-1610.
[34]CHEN S,HE H.Towards Incremental Learning of Nonstatio-nary Imbalanced Data Stream:A Multiple Selectively Recursive Approach[J].Evolving Systems,2011,2(1):35-50.
[35]SUN Y,TANG K,MINKU L L,et al.Online Ensemble Lear-ning of Data Streams with Gradually Evolved Classes[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(6):1532-1545.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] LI Bin, WAN Yuan. Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment [J]. Computer Science, 2022, 49(8): 86-96.
[3] CHEN Yuan-yuan, WANG Zhi-hai. Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition [J]. Computer Science, 2022, 49(7): 25-30.
[4] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[5] KANG Yan, WANG Hai-ning, TAO Liu, YANG Hai-xiao, YANG Xue-kun, WANG Fei, LI Hao. Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection [J]. Computer Science, 2022, 49(6A): 125-132.
[6] CHU An-qi, DING Zhi-jun. Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation [J]. Computer Science, 2022, 49(4): 134-139.
[7] SUN Lin, HUANG Miao-miao, XU Jiu-cheng. Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief [J]. Computer Science, 2022, 49(4): 152-160.
[8] XIA Yuan, ZHAO Yun-long, FAN Qi-lin. Data Stream Ensemble Classification Algorithm Based on Information Entropy Updating Weight [J]. Computer Science, 2022, 49(3): 92-98.
[9] LI Zong-ran, CHEN XIU-Hong, LU Yun, SHAO Zheng-yi. Robust Joint Sparse Uncorrelated Regression [J]. Computer Science, 2022, 49(2): 191-197.
[10] YU Sai-sai, WANG Xiao-juan, ZHANG Qian-qian. Detection of Malicious Behavior in Encrypted Traffic Based on Heuristic Search Feature Selection [J]. Computer Science, 2022, 49(11A): 210800237-6.
[11] LI Yong-hong, WANG Ying, LI La-quan, ZHAO Zhi-qiang. Application of Improved Feature Selection Algorithm in Spam Filtering [J]. Computer Science, 2022, 49(11A): 211000028-5.
[12] WANG Pan-hong, ZHU Chang-ming. MIF-CNNIF:A Multi-classification Image Data Framework Based on CNN with Intersect Features [J]. Computer Science, 2022, 49(11A): 210800267-8.
[13] YAN Zhen-chao, SHU Wen-hao, XIE Xin. Incremental Feature Selection Algorithm for Dynamic Partially Labeled Hybrid Data [J]. Computer Science, 2022, 49(11): 98-108.
[14] WANG Xiu-jun, MO Lei, ZHENG Xiao, GAO Yun-quan. Adaptive Histogram Publishing Algorithm for Sliding Window of Data Stream [J]. Computer Science, 2022, 49(10): 344-352.
[15] ZHANG Ye, LI Zhi-hua, WANG Chang-jie. Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method [J]. Computer Science, 2021, 48(9): 337-344.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!