计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 210800105-7.doi: 10.11896/jsjkx.210800105

• 人工智能 • 上一篇    下一篇

一种基于层次聚类和模拟退火的选择性集成算法的风控模型研究

王茂光, 冀昊悦, 王天明   

  1. 中央财经大学信息学院 北京 100081
  • 出版日期:2022-11-10 发布日期:2022-11-21
  • 通讯作者: 冀昊悦(jihaoyuew@163.com)
  • 作者简介:(wangmg@cufe.edu.cn)
  • 基金资助:
    国家自然科学基金(62072487);中央财经大学科技项目(020676116004,020676114004)

Study on Risk Control Model of Selective Ensemble Algorithm Based on Hierarchical Clustering and Simulated Annealing

WANG Mao-guang, JI Hao-yue, WANG Tian-ming   

  1. School of Information,Central University of Finance and Economics,Beijing 100081,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:WANG Mao-guang,born in 1974,Ph.D,professor,is a member of China Computer Federation.His main research interests include intelligent risk control models and algorithms,big data and intelligent software engineering etc.
    JI Hao-yue,born in 1998,postgraduate.Her main research interests include Internet financial risk control and credit investigation.
  • Supported by:
    National Natural Science Foundation of China(62072487) and Research Projects of Central University of Finance and Technology(020676116004,020676114004).

摘要: 集成学习模型可有效解决单一模型出现的模型结构单一、稳定性和预测能力弱的问题。但是由于结构复杂,其常出现运行效率低下、存储代价过大等问题,一般使用选择性集成算法优化集成学习模型来解决这些问题。目前提出的选择性集成算法仍存在运行效果和效率提升不够明显的现象。为解决这些问题,提出一种基于Stacking集成框架的选择性集成算法,算法主要使用了凝聚型层次聚类(AHC)算法和模拟退火的Metropolis准则对基学习器的种类和个数进行筛选。在实证分析方面,分别使用了国内外网贷对模型进行搭建。实验结果证明,AHC-Metropolis选择性集成模型可有效提升计算效率、预测能力、稳定性和泛化能力,有助于规范互联网金融行业秩序,协助开展金融监管任务,为建立我国金融风控管理体系和保障国家金融安全提供有效依据。

关键词: 层次聚类, 模拟退火, 选择性集成, 金融风控

Abstract: Eensemble learning model can effectively solve the problems of single model structure,stability and weak predictive ability.However,due to the complexity of its structure,problems such as low operating efficiency and excessive storage cost often occur.Selective ensemble algorithms are often used to optimize ensemble learning models to solve these problems.The currently proposed selective ensemble algorithm still has the phenomenon of insufficient operating effect and efficiency improvement.In order to make up for these shortcomings,a selective ensemble algorithm based on the stacking ensemble framework is proposed.It mainly uses the agglomerated hierarchical clustering(AHC) algorithm and the metropolis criterion of simulated annealing to select the type and number of base learners.In terms of empirical analysis,domestic and foreign online loan data are used separately to build the model.Experimental results prove that the selective ensemble model of AHC-Metropolis can effectively improve the computational efficiency,predictive ability,stability and generalization ability.It is helpful for regulating the order of the Internet financial industry,assist in financial supervision tasks,and provide an effective basis for establishing our country’s financial risk control management system and guaranteeing national financial security.

Key words: Hierarchical clustering, Simulated annealing, Selective ensemble, Financial risk control

中图分类号: 

  • TP181
[1]HAND D J,HENLEY W E.Statistical lassification methods in consumer credit scoring [J].Journal of the Royal Statistical Society,1997,160(3):523-541.
[2]SHENG J.Credit card cash out detection scoring model based on Logistic [J].Computer Applications,2009,29(11):3088-3091,3095.
[3]FANG K N,ZHANG G J,ZHANG H Y.Personal credit risk early warning method based on Lasso-logistic model[J].Quantitative Economics and Technical Economics,2014,31(2):125-136.
[4]ZHANG Y J,JIA H Y,DIAO Y F,et al.Research on CreditScoring by Fusing Social Media Information in Online Peer-to-Peer Lending[J].Procedia Computer Science,2016,91:168-174.
[5]PANG S L,HOU X Y,XIA L H.Borrowers’ credit quality scoring model and applications,with default discriminant analysis based on the extreme learning machine[J].Technological Forecasting and Social Change,2021:120462.
[6]LI X S,GUO Y H.Personal credit evaluation model based on Naive Bayes classifier[J].Computer Engineering and Applications,2006(30):197-201.
[7]WEST D.Neural network credit scoring models[J].Computers &Operations Research,2000,27:1131-1152.
[8]LI Y,JIANG T Y,LIU Y R.Research on Internet PersonalCredit Evaluation Based on Unbalanced Samples[J].Statistics and Information Forum,2017,32(2):84-90.
[9]PIERRE G,ERNST D,WEHENKEL L.Extremely randomized trees[J].Machine Learning,2006,63(1):3-42.
[10]ZHOU Q Y.Application Research of Improved AdaBoost Algorithm in Credit Imbalance Classification[D].Huangzhou:Zhejiang Gongshang University,2020.
[11]FINLAY S.Multiple cassifer achitectures and their apication to credit risk asessment[J].European Jourmal of Operational Research,2011,210(2):368-378.
[12]SUN J,LI H,CHANG P C,et al.Dynamic credit scoring using B & B with incremental-SVM-ensemble[J].Kybernetes,2015,44(4):518-535.
[13]DŽELIHODŽIĆ A,DONKO D,KEVRIĆ J.Improved CreditScoring Model Based on Bagging Neural Network[J].International Journal of Information Technology & Decision Making,2018,17(6):17.
[14]NASCIMENTO D S C,COELHO A L V,CANUTO A M P.Integrating complementary techniques for promoting diversity in classifier ensembles:A systematic study[J].Neurocomputing,2014,138:347-357.
[15]LESSMANN S,BAESENS B,SCOW H V,et al.Benchmarking stat-of-the-art lassification algorithms for credit scoring:an update of research [J].European Joumal of Operational Rescarch,2015,247(1):124-136.
[16]LIU C Z,MA D L,XIA Y F.Application of Dynamic Heterogeneous Integrated Credit Scoring Model in P2P Network Lending[J].Financial Development Research,2018(9):24-31.
[17]QI H,WANG W J,GUO H S.A SVM Bagging ensemblemethod based on feature selection[J].Small Microcomputer System,2014,35(11):2533-2537.
[18]LI Y J,GUO H X,LI Y N,et al.Classification of an ensemble learning algorithm based on Boosting in imbalanced data[J].System Engineering Theory and Practice,2016,36(1):189-199.
[19]YU L,YANG Z B,TANG L.A novel multistage deep belief network based extreme learning machine ensemble learning paradigm for credit risk assessment[J].Flexible Services and Manufacturing Journal,2016,28(4):576-592.
[20]WANG M,CAO Q,SUN J Z,et al.A method of user basic attribute prediction based on ensemble learning [J].Small Micro Computer System,2020,41(12):2509-2515.
[21]CAO Z H,YU D X,SHI J F,et al.Two-layer classifier model applied to personal credit evaluation [J].Control Engineering,2019,26(12):2231-2234.
[22]ZHOU Z H,WU J X,TANG W.Ensembling neural networks:Many could be better than al1[J].Artificial Intelligence,2002,137(1/2):239-263.
[23]ZHANG C X,ZHANG J S.Overview of selective ensemblelearning algorithms [J].Chinese Journal of Computers,2011,34(8):1399-1410.
[24]XIA Y F.A novel heterogeneous ensemble credit scoring model based on bstacking approach[J].Expert Systems with Applications,2018,93:182-199.
[25]WU M H,GUO J S,JU Y,et al.Parallel selective ensemble algorithm based on hierarchical filtering and dynamic update [J].Computer Science,2017,44(1):48-52.
[26]DU H L,ZHANG Y.Network anomaly detection based on selective ensemble algorithm[J].The Journal of Supercomputing,2020(prepublish):1-22.
[27]YU J Y.Research on corporate credit risk assessment based on heterogeneous learner integration strategy [D].Beijing:Central University of Finance and Economics,2019.
[28]YANG H.Design and research of risk control model of micro-online loan platform based on migration learning [D].Beijing:Central University of Finance and Economics,2021.
[29]COHEN J.A Coefficient of Agreement for Nominal Scales[J].Educational and Psychological Measurement,1960,20(1):37-46.
[1] 吴晓雯, 郑巧仙, 徐鑫强.
改进蚁群算法求解多目标单边装配线平衡问题
Improved Ant Colony Algorithm for Solving Multi-objective Unilateral Assembly Line Balancing Problem
计算机科学, 2022, 49(11A): 210900165-5. https://doi.org/10.11896/jsjkx.210900165
[2] 高士顺, 赵海涛, 张晓瀛, 魏急波.
一种自适应于不同场景的智能无线传播模型
Self-adaptive Intelligent Wireless Propagation Model to Different Scenarios
计算机科学, 2021, 48(7): 324-332. https://doi.org/10.11896/jsjkx.201000181
[3] 王国武, 陈元琰.
基于跳数修正和遗传模拟退火优化DV-Hop定位算法
Improvement of DV-Hop Location Algorithm Based on Hop Correction and Genetic Simulated Annealing Algorithm
计算机科学, 2021, 48(6A): 313-316. https://doi.org/10.11896/jsjkx.201000101
[4] 王喆, 唐麒, 王玲, 魏急波.
一种基于模拟退火的动态部分可重构系统划分-调度联合优化算法
Joint Optimization Algorithm for Partition-Scheduling of Dynamic Partial Reconfigurable Systems Based on Simulated Annealing
计算机科学, 2020, 47(8): 26-31. https://doi.org/10.11896/jsjkx.200500110
[5] 陈庆超, 王韬, 冯文博, 尹世庄, 刘丽君.
基于最长连续间隔的未知二进制协议格式推断
Unknown Binary Protocol Format Inference Method Based on Longest Continuous Interval
计算机科学, 2020, 47(8): 313-318. https://doi.org/10.11896/jsjkx.190700031
[6] 金小敏, 滑文强.
移动云计算中面向能耗优化的资源管理
Energy Optimization Oriented Resource Management in Mobile Cloud Computing
计算机科学, 2020, 47(6): 247-251. https://doi.org/10.11896/jsjkx.190400020
[7] 张云帆,周宇,黄志球.
基于语义相似度的API使用模式推荐
Semantic Similarity Based API Usage Pattern Recommendation
计算机科学, 2020, 47(3): 34-40. https://doi.org/10.11896/jsjkx.190300053
[8] 张德干, 杨鹏, 张捷, 高瑾馨, 张婷.
基于量子粒子群优化策略的车联网交通流量预测方法
New Method of Traffic Flow Forecasting of Connected Vehicles Based on Quantum Particle Swarm Optimization Strategy
计算机科学, 2020, 47(11A): 327-333. https://doi.org/10.11896/jsjkx.191200126
[9] 许飞翔,叶霞,李琳琳,曹军博,王馨.
基于SA-BP算法的本体概念语义相似度综合计算
Comprehensive Calculation of Semantic Similarity of Ontology Concept Based on SA-BP Algorithm
计算机科学, 2020, 47(1): 199-204. https://doi.org/10.11896/jsjkx.181202351
[10] 王改云, 王磊杨, 路皓翔.
基于混合群智能算法优化的RSSI质心定位算法
RSSI-based Centroid Localization Algorithm Optimized by Hybrid Swarm Intelligence Algorithm
计算机科学, 2019, 46(9): 125-129. https://doi.org/10.11896/j.issn.1002-137X.2019.09.017
[11] 张焕龙, 高增, 张秀娇, 史坤峰.
混合模拟退火与蚁狮优化的图像匹配方法
Image Matching Method Combining Hybrid Simulated Annealing and Antlion Optimizer
计算机科学, 2019, 46(6): 328-333. https://doi.org/10.11896/j.issn.1002-137X.2019.06.050
[12] 夏英, 李刘杰, 张旭, 裴海英.
基于层次聚类的不平衡数据加权过采样方法
Weighted Oversampling Method Based on Hierarchical Clustering for Unbalanced Data
计算机科学, 2019, 46(4): 22-27. https://doi.org/10.11896/j.issn.1002-137X.2019.04.004
[13] 吴祎凡, 崔艳鹏, 胡建伟.
基于层次聚类的警报处理方法
Alert Processing Method Based on Hierarchical Clustering
计算机科学, 2019, 46(4): 203-209. https://doi.org/10.11896/j.issn.1002-137X.2019.04.032
[14] 刘景发, 李帆, 蒋盛益.
基于综合优先度和主机信息的暴雨灾害主题退火爬虫算法
Focused Annealing Crawler Algorithm for Rainstorm Disasters Based on Comprehensive Priority and Host Information
计算机科学, 2019, 46(2): 215-222. https://doi.org/10.11896/j.issn.1002-137X.2019.02.033
[15] 刘景森, 刘丽, 李煜.
融合模拟退火机制的自适应花朵授粉算法
Adaptive Flower Pollination Algorithm with Simulated Annealing Mechanism
计算机科学, 2018, 45(11): 231-237. https://doi.org/10.11896/j.issn.1002-137X.2018.11.036
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!