计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 45-55.doi: 10.11896/jsjkx.230900107
辛博, 丁志军
XIN Bo, DING Zhijun
摘要: 随着社会经济的快速发展,信贷业务在金融领域中扮演着越来越重要的角色,利用机器学习算法进行信用评估成为了当前主流的方法。然而,目前仍存在一些问题亟待解决,如延迟标签带来的有标签数据不充分、模型滞后性的问题,以及动态信用评估模型缺乏可解释性的问题。针对这些问题,提出了一种面向延迟标签场景的可解释信用评估模型。该模型在动态模型树的基础上进行了加权改进,结合了延迟标签更新算法和自适应阈值的伪标签选择策略,将延迟标签数据看作反馈数据和伪标签数据两种状态分别进行处理,平衡了有标签数据不充分和模型滞后带来的影响,并实现了模型的可解释性。最后,在一些合成和真实的信用评估数据集上对模型进行了实验,与其他主流的算法相比,其更好地权衡了预测性能和可解释性。
中图分类号:
[1]BASTANI K,ASGARI E,NAMAVARI H.Wide and deeplearning for peer-to-peer lending[J].Expert Systems with Applications,2019,134:209-224. [2]LESSMANN S,BAESENS B,SEOW H V,et al.Benchmarking state-of-the-art classification algorithms for credit scoring:An update of research[J].European Journal of Operational Research,2015,247(1):124-136. [3]GOMES H M,GRZENDA M,MELLO R F D,et al.A Survey on Semi-supervised Learning for Delayed Partially Labelled Data Streams[J].ACM Computing Surveys,2022,55(4):1-42. [4]TAN F,HOU X,ZHANG J,et al.A Deep Learning Approach to Competing Risks Representation in Peer-to-Peer Lending[J].IEEE transactions on neural networks and learning systems,2018,30(5):1565-1574. [5]DU M,LIU N,HU X.Techniques for interpretable machinelearning[J].Communications of the ACM,2019,63(1):68-77. [6]JIAO L,YANG H,LIU Z G,et al.Interpretable fuzzy clustering using unsupervised fuzzy decision trees[J].Information Sciences,2022,611:540-563. [7]LIU H,ZHOU Y,LIU B,et al.Incremental learning with neural networks for computer vision:a survey[J].Artificial Intelligence Review,2023,56(5):4557-4589. [8]YU Z,WANG D,ZHAO Z,et al.Hybrid Incremental Ensemble Learning for Noisy Real-World Data Classification[J].IEEE transactions on cybernetics,2017,49(2):403-416. [9]DYER K B,CAPO R,POLIKAR R.COMPOSE:A Semisupervised Learning Framework for Initially Labeled Nonstationary Streaming Data[J].IEEE transactions on neural networks and learning systems,2013,25(1):12-26. [10]GAO H,DING Z.A Novel Machine Learning Method for De-layed Labels [C]//2022 IEEE International Conference on Networking,Sensing and Control(ICNSC).IEEE,2022:1-6. [11]KUNCHEVA L I,SáNCHEZ J S.Nearest Neighbour Classifiers for Streaming Data with Delayed Labelling[C]//2008 Eighth IEEE International Conference on Data Mining.IEEE,2008:869-874. [12]GAO H,DING Z,PAN M.Incremental Learning Method for Data with Delayed Labels[J].Computing and Informatics,2022,41(5):1260-1283. [13]POZZOLO A D,BORACCHI G,CAELEN O,et al.Credit Card Fraud Detection:A Realistic Modeling and a Novel Learning Strategy[J].IEEE transactions on neural networks and learning systems,2017,29(8):3784-3797. [14]DAS M,PRATAMA M,ZHANG J,et al.A Skip-ConnectedEvolving Recurrent Neural Network for Data Stream Classification under Label Latency Scenario[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2020:3717-3724. [15]GUNNARSSON B R,BROUCKE S V,BAESENS B,et al.Deep learning for credit scoring:Do or don’t?[J].European Journal of Operational Research,2021,295(1):292-305. [16]RIBEIRO M T,SINGH S,GUESTRIN C.“Why Should I Trust You?”[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2016:1135-1144. [17]LUNDBERG S M,ERION G G,CHEN H,et al.From local explanations to global understanding with explainable AI for trees[J].Nature Machine Intelligence,2020,2(1):56-67. [18]DONG L A,YE X,YANG G.Two-stage rule extraction method based on tree ensemble model for interpretable loan evaluation[J].Information Sciences,2021,573:46-64. [19]ALANGARI N,MENAI M E,MATHKOUR H,et al.Intrinsically Interpretable Gaussian Mixture Model[J].Information,2023,14(3):164. [20]DOMINGOS P,HULTEN G.Mining high-speed data streams[C]//Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2000:71-80. [21]POTTS D,SAMMUT C.Incremental Learning of Linear Model Trees[J].Machine Learning,2005,61(1/2/3):5-48. [22]HAUG J,BROELEMANN K,KASNECI G.Dynamic ModelTree for Interpretable Data Stream Learning[C]//2022 IEEE 38th International Conference on Data Engineering(ICDE).IEEE,2022:2562-2574. [23]BROELEMANN K,KASNECI G.A Gradient-Based Split Criterion for Highly Accurate and Transparent Model Trees[C]//Proceedings of the Twenty-Eighth International Joint Confe-rence on Artificial Intelligence.IJCAI,2019:2030-7. [24]GRZENDA M,GOMES H M,BIFET A.Delayed labelling eva-luation for data streams[J].Data Mining and Knowledge Disco-very,2020,34(5):1237-1266. [25]STREET W N,KIM Y.A streaming ensemble algorithm(SEA) for large-scale classification[C]//Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.ACM,2001:377-382. [26]HULTEN G,SPENCER L,DOMINGOS P.Mining time-chan-ging data streams[C]//Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2001:97-106. [27]AGRAWAL R,IMIELINSKI T,SWAMI A N.Database Mi-ning:A Performance Perspective [J].IEEE Transactions on Knowledge and Data Engineering,1993,5(6):914-925. [28]IKONOMOVSKA E,GAMA J,DZEROSKI S.Learning model trees from evolving data streams[J].Data Mining and Know-ledge Discovery,2011,23:128-168. [29]MANAPRAGADA C,WEBB G I,SALEHI M.Extremely Fast Decision Tree [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.ACM,2018:1953-1962. [30]GOMES H M,BIFET A,READ J,et al.Adaptive random fo-rests for evolving data stream classification[J].Machine Lear-ning,2017,106:1469-1495. |
|