计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220600241-6.doi: 10.11896/jsjkx.220600241

• 大数据&数据科学 • 上一篇    下一篇

改进的森林优化特征选择算法在信用评估中的应用

黄宇航, 宋友, 王宝会   

  1. 北京航空航天大学软件学院 北京 100191
  • 出版日期:2023-06-10 发布日期:2023-06-12
  • 通讯作者: 王宝会(wangbh@buaa.edu.cn)
  • 作者简介:(ITcathyh@buaa.edu.cn)

Improved Forest Optimization Feature Selection Algorithm for Credit Evaluation

HUANG Yuhang, SONG You, WANG Baohui   

  1. College of Software,Beihang University,Beijing 100191,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:HUANG Yuhang,born in 1998,postgraduate.His main research interests include data mining and software engineering. WANG Baohui,born in 1973,senior engineer,master supervisor.His main research interests include software architecture,big data,artificial intelligence,etc.

摘要: 信用评估是金融领域的一个关键问题,它可以预测出一个用户是否存在拖欠风险,从而减少坏账损失。信用评估的关键挑战之一就是数据集存在着大量无效或冗余特征。为了解决该问题,提出了一种改进的森林优化特征选择算法(Improved Feature Selection using Forest Optimization Algorithm,IFSFOA)。该算法针对原始算法FSFOA的不足,在初始化阶段使用基于卡方校验的初始化策略代替随机化初始,提升算法寻优的能力;在局部播种阶段利用多层级变异策略,优化局部搜索能力,解决FSFOA的搜索空间受限和局部性问题;在更新候选森林时,使用贪婪选取策略挑选优质树,淘汰劣质树,收敛搜索发散过程。最后在涵盖了低维、中维和高维的公开信用评估数据集上设置对比实验,结果表明IFSFOA在分类和维度缩减方面的能力的综合表现均优于FSFOA和近年提出的较为高效的特征选择算法,验证了IFSFOA的有效性。

关键词: 森林优化算法, 特征选择, 信用评估, 演化计算, 包裹式方法

Abstract: Credit evaluation is a key problem in finance,which predicts whether a user is at risk of defaulting and thus reduces bad debt losses.One of the key challenges in credit evaluation is the presence of a large number of invalid or redundant features in the dataset.To solve this problem,an improved feature selection using forest optimization algorithm(IFSFOA) is proposed.It addresses the shortcomings of the original algorithm FSFOA by using a cardinality check-based initialization strategy instead of randomized initialization in the initialization phase to improve the algorithm’s search capability;using a multi-level variation strategy in the local seeding phase to optimize the local search capability and solve the problems of restricted search space and localization of FSFOA;using a greedy selection strategy to select high-quality trees and eliminate low-quality trees when updating the candidate forest.In updating the candidate forest,we use the greedy selection strategy to select high-quality trees and eliminate low-quality trees,and converge the search dispersion process.Finally,the results show that IFSFOA outperforms FSFOA and more efficient feature selection algorithms proposed in recent years in terms of classification ability and dimension reduction ability,and validates the effectiveness of IFSFOA by setting up comparison experiments on public credit evaluation datasets covering low,medium and high dimensions.

Key words: Forest optimization algorithm, Feature selection, Credit evaluation, Evolutionary computation, Wrapper methods

中图分类号: 

  • TP3-05
[1]LI Z Q,DU J Q,NIE B,et al.Summary of Feature Selection Methods[J].Computer Engineering and Applications,2019,55(24):10-19.
[2]KHAIRE U M,DHANALAKSHMI R.Stability of feature selection algorithm:A review[J].Journal of King Saud University-Computer and Information Sciences,2019,34(4):1060-1073.
[3]ABUKWAIK H,BURGER A,ANDAM B K,et al.Semi-automated feature traceability with embedded annotations[C]//Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2018:529-533.
[4]AGRAWAL P,ABUTARBOUSH H F,GANESH T,et al.Metaheuristic algorithms on feature selection:A survey of one de-cade of research(2009-2019)[J].IEEE Access,2021,9(1):26766-26791.
[5]GHOSH M,GUHA R,SARKAR R,et al.A wrapper-filter feature selection technique based on ant colony optimization[J].Neural Computing and Applications,2020,32(12):7839-7857.
[6]MALDONADO J,RIFF M C,NEVEU B.A review of recent ap-proaches on wrapper feature selection for intrusion detection[J].Expert Systems with Applications,2022,198(1):116822.
[7]HANCER E,XUE B,ZHANG M.Differential evolution for filter feature selection based on information theory and feature ranking[J].Knowledge-Based Systems,2018,140(1):103-119.
[8]MALDONADO S,LÓPEZ J.Dealing with high-dimensionalclass-imbalanced datasets:Embedded feature selection for SVM classification[J].Applied Soft Computing,2018,67(1):94-105.
[9]LU M.Embedded feature selection accounting for unknown data heterogeneity[J].Expert Systems with Applications,2019,119(1):350-361.
[10]EL ABOUDI N,BENHLIMA L.Review on wrapper feature selection approaches[C]//Proceedings of the 2016 International Conference on Engineering & MIS(ICEMIS).IEEE,2016:1-5.
[11]BOUZOUBAA K,TAHER Y,NSIRI B.Dos attack forecasting:A comparative study on wrapper feature selection[C]//Procee-dings of the 2020 International Conference on Intelligent Systems and Computer Vision(ISCV).IEEE,2020:1-7.
[12]KARUNAKARAN V,RAJASEKAR V,JOSEPH S.Exploring a filter and wrapper feature selection techniques in machine learning[M]//Computational Vision and Bio-Inspired Computing.Springer,2021:497-506.
[13]BALOGUN A O,BASRI S,JADID S A,et al.Search-basedwrapper feature selection methods in software defect prediction:an empirical analysis[C]//Proceedings of the Computer Science On-line Conference.Springer,2020:492-503.
[14]LIU J H,LIN M L,ZHANG J,et al.A kind of heuristic local random feature selection algorithm[J].Computer Engineering and Applications,2016,52(2):170-174.
[15]KOWAL M,SKOBEL M,NOWICKI N.The feature selection problem in computer-assisted cytology[J].International Journal of Applied Mathematics and Computer Science,2018,28(4):759-770.
[16]SHARMA M,KAUR P.A comprehensive analysis of nature-inspired meta-heuristic techniques for feature selection problem[J].Archives of Computational Methods in Engineering,2021,28(3):1103-1127.
[17]ROSTAMI M,BERAHMAND K,NASIRI E,et al.Review ofswarm intelligence-based feature selection methods[J].Engineering Applications of Artificial Intelligence,2021,100(1):104210-104210.
[18]TELIKANI A,TAHMASSEBI A,BANZHAF W,et al.Evolutionary Machine Learning:A Survey[J].ACM Computing Surveys(CSUR),2021,54(8):1-35.
[19]EIBEN A E,SMITH J E.What is an evolutionary algorithm?[M]//Introduction to Evolutionary Computing.Springer,2015:25-48.
[20]BABATUNDE O H,ARMSTRONG L,LENG J,et al.A genetic algorithm-based feature selection[J].International Journal of Electronics Communication and Computer Engineering,2014,5(4):899-905.
[21]HAFEZ A I,ZAWBAA H M,EMARY E,et al.Sine cosine optimization algorithm for feature selection[C]//Proceedings of the 2016 International Symposium on Innovations in Intelligent Systems and Applications(INISTA).IEEE,2016:1-5.
[22]TUBISHAT M,ABUSHARIAH M A,IDRIS N,et al.Improved whale optimization algorithm for feature selection in Arabic sentiment analysis[J].Applied Intelligence,2019,49(5):1688-1707.
[23]GHAEMI M,FEIZI-DERAKHSHI M R.Feature selectionusing forest optimization algorithm[J].Pattern Recognition,2016,60(1):121-129.
[24]TOO J,ABDULLAH A R,MOHD SAAD N.A new quadratic binary harris hawk optimization for feature selection[J].Electronics,2019,8(10):1130.
[25]HEGAZY A E,MAKHLOUF M,EL-TAWEL G S.Feature selection using chaotic salp swarm algorithm for data classification[J].Arabian Journal for Science and Engineering,2019,44(4):3801-3816.
[26]LIU H,ZHOU M,LIU Q.An embedded feature selection method for imbalanced data classification[J].IEEE/CAA Journal of Automatica Sinica,2019,6(3):703-715.
[27]ADO A,DERIS M M,SAMSUDIN N A,et al.Adaptive and Global Approaches Based Feature Selection for Large-Scale Hierarchical Text Classification[C]//Proceedings of the International Conference of Reliable Information and Communication Technology.Springer,2022:105-116.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!