计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 45-55.doi: 10.11896/jsjkx.230900107

• 数据库&大数据&数据科学 • 上一篇    下一篇

面向延迟标签场景下的可解释信用评估模型

辛博, 丁志军   

  1. 嵌入式系统与服务计算教育部重点实验室(同济大学) 上海 201804
    上海市网络金融安全协同创新中心(同济大学) 上海 201804
  • 收稿日期:2023-09-19 修回日期:2023-12-13 出版日期:2024-08-15 发布日期:2024-08-13
  • 通讯作者: 丁志军(dingzj@tongji.edu.cn)
  • 作者简介:(2232941@tongji.edu.cn)

Interpretable Credit Evaluation Model for Delayed Label Scenarios

XIN Bo, DING Zhijun   

  1. Key Laboratory of Embedded System and Service Computing of Ministry of Education(Tongji University),Shanghai 201804,China
    Shanghai Network Finance Security Collaborative Innovation Center(Tongji University),Shanghai 201804,China
  • Received:2023-09-19 Revised:2023-12-13 Online:2024-08-15 Published:2024-08-13
  • About author:XIN Bo,born in 2000,postgraduate.His main research interests include credit evaluation and machine learning.
    DING Zhijun,born in 1974,Ph.D,Professor,Ph.D supervisor,is a senior member of CCF(No.14797S).His main research interests include intelligent software engineering,cloud computing and services,big data credit reporting and financial risk control.

摘要: 随着社会经济的快速发展,信贷业务在金融领域中扮演着越来越重要的角色,利用机器学习算法进行信用评估成为了当前主流的方法。然而,目前仍存在一些问题亟待解决,如延迟标签带来的有标签数据不充分、模型滞后性的问题,以及动态信用评估模型缺乏可解释性的问题。针对这些问题,提出了一种面向延迟标签场景的可解释信用评估模型。该模型在动态模型树的基础上进行了加权改进,结合了延迟标签更新算法和自适应阈值的伪标签选择策略,将延迟标签数据看作反馈数据和伪标签数据两种状态分别进行处理,平衡了有标签数据不充分和模型滞后带来的影响,并实现了模型的可解释性。最后,在一些合成和真实的信用评估数据集上对模型进行了实验,与其他主流的算法相比,其更好地权衡了预测性能和可解释性。

关键词: 信用评估, 延迟标签, 可解释性, 动态模型树, 伪标签选择

Abstract: With the rapid development of social economy,credit business plays an increasingly important role in the financial field,and using machine learning algorithms for credit evaluation has become the mainstream method.However,there are still some problems to be solved,such as the inadequacy of labeled data and model lag caused by delayed labels,and the lack of interpretability in dynamic credit evaluation models.To address these problems,this paper proposes an interpretable credit evaluation model for delayed label scenarios.Built upon the foundation of dynamic model trees,the model incorporates weighted enhancements.It combines delayed label update algorithms and a pseudo-label selection strategy with adaptive thresholds,treating delayed label data as both feedback data and pseudo-label data,effectively mitigating the impacts of insufficient labeled data and model lag.Moreover,the model achieves interpretability.It is finally tested on some synthetic and real credit evaluation datasets,demonstrating superior balance between predictive performance and interpretability compared to other mainstream algorithms.

Key words: Credit evaluation, Delayed label, Interpretability, Dynamic model tree, Pseudo-label selection

中图分类号: 

  • TP3-05
[1]BASTANI K,ASGARI E,NAMAVARI H.Wide and deeplearning for peer-to-peer lending[J].Expert Systems with Applications,2019,134:209-224.
[2]LESSMANN S,BAESENS B,SEOW H V,et al.Benchmarking state-of-the-art classification algorithms for credit scoring:An update of research[J].European Journal of Operational Research,2015,247(1):124-136.
[3]GOMES H M,GRZENDA M,MELLO R F D,et al.A Survey on Semi-supervised Learning for Delayed Partially Labelled Data Streams[J].ACM Computing Surveys,2022,55(4):1-42.
[4]TAN F,HOU X,ZHANG J,et al.A Deep Learning Approach to Competing Risks Representation in Peer-to-Peer Lending[J].IEEE transactions on neural networks and learning systems,2018,30(5):1565-1574.
[5]DU M,LIU N,HU X.Techniques for interpretable machinelearning[J].Communications of the ACM,2019,63(1):68-77.
[6]JIAO L,YANG H,LIU Z G,et al.Interpretable fuzzy clustering using unsupervised fuzzy decision trees[J].Information Sciences,2022,611:540-563.
[7]LIU H,ZHOU Y,LIU B,et al.Incremental learning with neural networks for computer vision:a survey[J].Artificial Intelligence Review,2023,56(5):4557-4589.
[8]YU Z,WANG D,ZHAO Z,et al.Hybrid Incremental Ensemble Learning for Noisy Real-World Data Classification[J].IEEE transactions on cybernetics,2017,49(2):403-416.
[9]DYER K B,CAPO R,POLIKAR R.COMPOSE:A Semisupervised Learning Framework for Initially Labeled Nonstationary Streaming Data[J].IEEE transactions on neural networks and learning systems,2013,25(1):12-26.
[10]GAO H,DING Z.A Novel Machine Learning Method for De-layed Labels [C]//2022 IEEE International Conference on Networking,Sensing and Control(ICNSC).IEEE,2022:1-6.
[11]KUNCHEVA L I,SáNCHEZ J S.Nearest Neighbour Classifiers for Streaming Data with Delayed Labelling[C]//2008 Eighth IEEE International Conference on Data Mining.IEEE,2008:869-874.
[12]GAO H,DING Z,PAN M.Incremental Learning Method for Data with Delayed Labels[J].Computing and Informatics,2022,41(5):1260-1283.
[13]POZZOLO A D,BORACCHI G,CAELEN O,et al.Credit Card Fraud Detection:A Realistic Modeling and a Novel Learning Strategy[J].IEEE transactions on neural networks and learning systems,2017,29(8):3784-3797.
[14]DAS M,PRATAMA M,ZHANG J,et al.A Skip-ConnectedEvolving Recurrent Neural Network for Data Stream Classification under Label Latency Scenario[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2020:3717-3724.
[15]GUNNARSSON B R,BROUCKE S V,BAESENS B,et al.Deep learning for credit scoring:Do or don’t?[J].European Journal of Operational Research,2021,295(1):292-305.
[16]RIBEIRO M T,SINGH S,GUESTRIN C.“Why Should I Trust You?”[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2016:1135-1144.
[17]LUNDBERG S M,ERION G G,CHEN H,et al.From local explanations to global understanding with explainable AI for trees[J].Nature Machine Intelligence,2020,2(1):56-67.
[18]DONG L A,YE X,YANG G.Two-stage rule extraction method based on tree ensemble model for interpretable loan evaluation[J].Information Sciences,2021,573:46-64.
[19]ALANGARI N,MENAI M E,MATHKOUR H,et al.Intrinsically Interpretable Gaussian Mixture Model[J].Information,2023,14(3):164.
[20]DOMINGOS P,HULTEN G.Mining high-speed data streams[C]//Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2000:71-80.
[21]POTTS D,SAMMUT C.Incremental Learning of Linear Model Trees[J].Machine Learning,2005,61(1/2/3):5-48.
[22]HAUG J,BROELEMANN K,KASNECI G.Dynamic ModelTree for Interpretable Data Stream Learning[C]//2022 IEEE 38th International Conference on Data Engineering(ICDE).IEEE,2022:2562-2574.
[23]BROELEMANN K,KASNECI G.A Gradient-Based Split Criterion for Highly Accurate and Transparent Model Trees[C]//Proceedings of the Twenty-Eighth International Joint Confe-rence on Artificial Intelligence.IJCAI,2019:2030-7.
[24]GRZENDA M,GOMES H M,BIFET A.Delayed labelling eva-luation for data streams[J].Data Mining and Knowledge Disco-very,2020,34(5):1237-1266.
[25]STREET W N,KIM Y.A streaming ensemble algorithm(SEA) for large-scale classification[C]//Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.ACM,2001:377-382.
[26]HULTEN G,SPENCER L,DOMINGOS P.Mining time-chan-ging data streams[C]//Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2001:97-106.
[27]AGRAWAL R,IMIELINSKI T,SWAMI A N.Database Mi-ning:A Performance Perspective [J].IEEE Transactions on Knowledge and Data Engineering,1993,5(6):914-925.
[28]IKONOMOVSKA E,GAMA J,DZEROSKI S.Learning model trees from evolving data streams[J].Data Mining and Know-ledge Discovery,2011,23:128-168.
[29]MANAPRAGADA C,WEBB G I,SALEHI M.Extremely Fast Decision Tree [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.ACM,2018:1953-1962.
[30]GOMES H M,BIFET A,READ J,et al.Adaptive random fo-rests for evolving data stream classification[J].Machine Lear-ning,2017,106:1469-1495.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!