计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 210800105-7.doi: 10.11896/jsjkx.210800105
王茂光, 冀昊悦, 王天明
WANG Mao-guang, JI Hao-yue, WANG Tian-ming
摘要: 集成学习模型可有效解决单一模型出现的模型结构单一、稳定性和预测能力弱的问题。但是由于结构复杂,其常出现运行效率低下、存储代价过大等问题,一般使用选择性集成算法优化集成学习模型来解决这些问题。目前提出的选择性集成算法仍存在运行效果和效率提升不够明显的现象。为解决这些问题,提出一种基于Stacking集成框架的选择性集成算法,算法主要使用了凝聚型层次聚类(AHC)算法和模拟退火的Metropolis准则对基学习器的种类和个数进行筛选。在实证分析方面,分别使用了国内外网贷对模型进行搭建。实验结果证明,AHC-Metropolis选择性集成模型可有效提升计算效率、预测能力、稳定性和泛化能力,有助于规范互联网金融行业秩序,协助开展金融监管任务,为建立我国金融风控管理体系和保障国家金融安全提供有效依据。
中图分类号:
[1]HAND D J,HENLEY W E.Statistical lassification methods in consumer credit scoring [J].Journal of the Royal Statistical Society,1997,160(3):523-541. [2]SHENG J.Credit card cash out detection scoring model based on Logistic [J].Computer Applications,2009,29(11):3088-3091,3095. [3]FANG K N,ZHANG G J,ZHANG H Y.Personal credit risk early warning method based on Lasso-logistic model[J].Quantitative Economics and Technical Economics,2014,31(2):125-136. [4]ZHANG Y J,JIA H Y,DIAO Y F,et al.Research on CreditScoring by Fusing Social Media Information in Online Peer-to-Peer Lending[J].Procedia Computer Science,2016,91:168-174. [5]PANG S L,HOU X Y,XIA L H.Borrowers’ credit quality scoring model and applications,with default discriminant analysis based on the extreme learning machine[J].Technological Forecasting and Social Change,2021:120462. [6]LI X S,GUO Y H.Personal credit evaluation model based on Naive Bayes classifier[J].Computer Engineering and Applications,2006(30):197-201. [7]WEST D.Neural network credit scoring models[J].Computers &Operations Research,2000,27:1131-1152. [8]LI Y,JIANG T Y,LIU Y R.Research on Internet PersonalCredit Evaluation Based on Unbalanced Samples[J].Statistics and Information Forum,2017,32(2):84-90. [9]PIERRE G,ERNST D,WEHENKEL L.Extremely randomized trees[J].Machine Learning,2006,63(1):3-42. [10]ZHOU Q Y.Application Research of Improved AdaBoost Algorithm in Credit Imbalance Classification[D].Huangzhou:Zhejiang Gongshang University,2020. [11]FINLAY S.Multiple cassifer achitectures and their apication to credit risk asessment[J].European Jourmal of Operational Research,2011,210(2):368-378. [12]SUN J,LI H,CHANG P C,et al.Dynamic credit scoring using B & B with incremental-SVM-ensemble[J].Kybernetes,2015,44(4):518-535. [13]DELIHODIĆ A,DONKO D,KEVRIĆ J.Improved CreditScoring Model Based on Bagging Neural Network[J].International Journal of Information Technology & Decision Making,2018,17(6):17. [14]NASCIMENTO D S C,COELHO A L V,CANUTO A M P.Integrating complementary techniques for promoting diversity in classifier ensembles:A systematic study[J].Neurocomputing,2014,138:347-357. [15]LESSMANN S,BAESENS B,SCOW H V,et al.Benchmarking stat-of-the-art lassification algorithms for credit scoring:an update of research [J].European Joumal of Operational Rescarch,2015,247(1):124-136. [16]LIU C Z,MA D L,XIA Y F.Application of Dynamic Heterogeneous Integrated Credit Scoring Model in P2P Network Lending[J].Financial Development Research,2018(9):24-31. [17]QI H,WANG W J,GUO H S.A SVM Bagging ensemblemethod based on feature selection[J].Small Microcomputer System,2014,35(11):2533-2537. [18]LI Y J,GUO H X,LI Y N,et al.Classification of an ensemble learning algorithm based on Boosting in imbalanced data[J].System Engineering Theory and Practice,2016,36(1):189-199. [19]YU L,YANG Z B,TANG L.A novel multistage deep belief network based extreme learning machine ensemble learning paradigm for credit risk assessment[J].Flexible Services and Manufacturing Journal,2016,28(4):576-592. [20]WANG M,CAO Q,SUN J Z,et al.A method of user basic attribute prediction based on ensemble learning [J].Small Micro Computer System,2020,41(12):2509-2515. [21]CAO Z H,YU D X,SHI J F,et al.Two-layer classifier model applied to personal credit evaluation [J].Control Engineering,2019,26(12):2231-2234. [22]ZHOU Z H,WU J X,TANG W.Ensembling neural networks:Many could be better than al1[J].Artificial Intelligence,2002,137(1/2):239-263. [23]ZHANG C X,ZHANG J S.Overview of selective ensemblelearning algorithms [J].Chinese Journal of Computers,2011,34(8):1399-1410. [24]XIA Y F.A novel heterogeneous ensemble credit scoring model based on bstacking approach[J].Expert Systems with Applications,2018,93:182-199. [25]WU M H,GUO J S,JU Y,et al.Parallel selective ensemble algorithm based on hierarchical filtering and dynamic update [J].Computer Science,2017,44(1):48-52. [26]DU H L,ZHANG Y.Network anomaly detection based on selective ensemble algorithm[J].The Journal of Supercomputing,2020(prepublish):1-22. [27]YU J Y.Research on corporate credit risk assessment based on heterogeneous learner integration strategy [D].Beijing:Central University of Finance and Economics,2019. [28]YANG H.Design and research of risk control model of micro-online loan platform based on migration learning [D].Beijing:Central University of Finance and Economics,2021. [29]COHEN J.A Coefficient of Agreement for Nominal Scales[J].Educational and Psychological Measurement,1960,20(1):37-46. |
[1] | 吴晓雯, 郑巧仙, 徐鑫强. 改进蚁群算法求解多目标单边装配线平衡问题 Improved Ant Colony Algorithm for Solving Multi-objective Unilateral Assembly Line Balancing Problem 计算机科学, 2022, 49(11A): 210900165-5. https://doi.org/10.11896/jsjkx.210900165 |
[2] | 高士顺, 赵海涛, 张晓瀛, 魏急波. 一种自适应于不同场景的智能无线传播模型 Self-adaptive Intelligent Wireless Propagation Model to Different Scenarios 计算机科学, 2021, 48(7): 324-332. https://doi.org/10.11896/jsjkx.201000181 |
[3] | 王国武, 陈元琰. 基于跳数修正和遗传模拟退火优化DV-Hop定位算法 Improvement of DV-Hop Location Algorithm Based on Hop Correction and Genetic Simulated Annealing Algorithm 计算机科学, 2021, 48(6A): 313-316. https://doi.org/10.11896/jsjkx.201000101 |
[4] | 王喆, 唐麒, 王玲, 魏急波. 一种基于模拟退火的动态部分可重构系统划分-调度联合优化算法 Joint Optimization Algorithm for Partition-Scheduling of Dynamic Partial Reconfigurable Systems Based on Simulated Annealing 计算机科学, 2020, 47(8): 26-31. https://doi.org/10.11896/jsjkx.200500110 |
[5] | 陈庆超, 王韬, 冯文博, 尹世庄, 刘丽君. 基于最长连续间隔的未知二进制协议格式推断 Unknown Binary Protocol Format Inference Method Based on Longest Continuous Interval 计算机科学, 2020, 47(8): 313-318. https://doi.org/10.11896/jsjkx.190700031 |
[6] | 金小敏, 滑文强. 移动云计算中面向能耗优化的资源管理 Energy Optimization Oriented Resource Management in Mobile Cloud Computing 计算机科学, 2020, 47(6): 247-251. https://doi.org/10.11896/jsjkx.190400020 |
[7] | 张云帆,周宇,黄志球. 基于语义相似度的API使用模式推荐 Semantic Similarity Based API Usage Pattern Recommendation 计算机科学, 2020, 47(3): 34-40. https://doi.org/10.11896/jsjkx.190300053 |
[8] | 张德干, 杨鹏, 张捷, 高瑾馨, 张婷. 基于量子粒子群优化策略的车联网交通流量预测方法 New Method of Traffic Flow Forecasting of Connected Vehicles Based on Quantum Particle Swarm Optimization Strategy 计算机科学, 2020, 47(11A): 327-333. https://doi.org/10.11896/jsjkx.191200126 |
[9] | 许飞翔,叶霞,李琳琳,曹军博,王馨. 基于SA-BP算法的本体概念语义相似度综合计算 Comprehensive Calculation of Semantic Similarity of Ontology Concept Based on SA-BP Algorithm 计算机科学, 2020, 47(1): 199-204. https://doi.org/10.11896/jsjkx.181202351 |
[10] | 王改云, 王磊杨, 路皓翔. 基于混合群智能算法优化的RSSI质心定位算法 RSSI-based Centroid Localization Algorithm Optimized by Hybrid Swarm Intelligence Algorithm 计算机科学, 2019, 46(9): 125-129. https://doi.org/10.11896/j.issn.1002-137X.2019.09.017 |
[11] | 张焕龙, 高增, 张秀娇, 史坤峰. 混合模拟退火与蚁狮优化的图像匹配方法 Image Matching Method Combining Hybrid Simulated Annealing and Antlion Optimizer 计算机科学, 2019, 46(6): 328-333. https://doi.org/10.11896/j.issn.1002-137X.2019.06.050 |
[12] | 夏英, 李刘杰, 张旭, 裴海英. 基于层次聚类的不平衡数据加权过采样方法 Weighted Oversampling Method Based on Hierarchical Clustering for Unbalanced Data 计算机科学, 2019, 46(4): 22-27. https://doi.org/10.11896/j.issn.1002-137X.2019.04.004 |
[13] | 吴祎凡, 崔艳鹏, 胡建伟. 基于层次聚类的警报处理方法 Alert Processing Method Based on Hierarchical Clustering 计算机科学, 2019, 46(4): 203-209. https://doi.org/10.11896/j.issn.1002-137X.2019.04.032 |
[14] | 刘景发, 李帆, 蒋盛益. 基于综合优先度和主机信息的暴雨灾害主题退火爬虫算法 Focused Annealing Crawler Algorithm for Rainstorm Disasters Based on Comprehensive Priority and Host Information 计算机科学, 2019, 46(2): 215-222. https://doi.org/10.11896/j.issn.1002-137X.2019.02.033 |
[15] | 刘景森, 刘丽, 李煜. 融合模拟退火机制的自适应花朵授粉算法 Adaptive Flower Pollination Algorithm with Simulated Annealing Mechanism 计算机科学, 2018, 45(11): 231-237. https://doi.org/10.11896/j.issn.1002-137X.2018.11.036 |
|