计算机科学 ›› 2021, Vol. 48 ›› Issue (11A): 429-434.doi: 10.11896/jsjkx.201000013

• 图像处理& 多媒体技术 • 上一篇    下一篇

基于深度森林的P2P网贷借款人信用风险评估方法

王萧萧1, 王亭雯1, 马玉玲2, 范佳奕3, 崔超然1   

  1. 1 山东财经大学计算机科学与技术学院 济南250014
    2 山东建筑大学计算机科学与技术学院 济南250101
    3 青岛大学商学院 山东 青岛266000
  • 出版日期:2021-11-10 发布日期:2021-11-12
  • 通讯作者: 崔超然(crcui@sdufe.edu.cn)
  • 作者简介:xiaoxiao.wangq@aliyun.com
  • 基金资助:
    国家自然科学基金(61701281,62077033)

Credit Risk Assessment Method of P2P Online Loan Borrowers Based on Deep Forest

WANG Xiao-xiao1, WANG Ting-wen1, MA Yu-ling2, FAN Jia-yi3, CUI Chao-ran1   

  1. 1 School of Computer Science and Technology,Shandong University of Finance and Economics,Jinan 250014,China
    2 School of Computer Science and Technology,Shandong Jianzhu University,Jinan 250101,China
    3 School of Business,Qingdao University,Qingdao,Shandong 266000,China
  • Online:2021-11-10 Published:2021-11-12
  • About author:WANG Xiao-xiao,born in 1996,postgraduate.Her main research interest include data mining and so on.
    CUI Chao-ran,born in 1987,professor,is a member of China Computer Federation.His main research interests include information retrieval,multimedia,recom-mender systems and machine learning.
  • Supported by:
    National Natural Science Foundation of China(61701281,62077033).

摘要: P2P网络借贷是近年来新兴的一种金融业务模式,具有投资门槛低、交易方便快捷、融资成本低等优点。但在快速成长的同时,借贷过程中的信用风险问题也日益凸显,层出不穷的借款人跑路乃至诈骗事件给行业留下重大阴影。针对该问题,提出一种基于深度森林的网贷借款人信用风险评估方法。首先从借款人的基本信息和历史借款信息两类数据中提取特征;然后通过多粒度扫描和级联森林模块构建深度森林模型,对借款人进行违约预测,同时使用基尼指数计算随机森林的特征重要性评分,并使用波达计数法进行排序融合,从而对模型的预测结果给出一定的解释。在LendingClub和拍拍贷两个公开数据集上,将所提出的方法与支持向量机、随机森林和广而深的网络等方法进行了对比,实验表明该方法具有更好的性能,并且特征重要性评分符合人们的直观理解和客观认识。

关键词: P2P网络借贷, 不平衡数据集, 深度森林, 特征重要性, 信用风险评估

Abstract: P2P online lending is an emerging financial business model in recent years,which has many advantages of low investment threshold,convenient transaction and low financing cost.However,at the same time of rapid growth,the credit risk problem in the lending process has become increasingly prominent,and the endless stream of borrowers running away and even fraud have left a heavy shadow on the industry.Aiming at this problem,a credit risk assessment method of P2P online loan borrowers based on deep forest is proposed.Firstly,the features are extracted from the basic information and the historical loan information of the borrower.Then,the deep forest model was constructed by multi-granularity scanning and cascade forest module to predict the default of borrowers.At the same time,Gini index was used to calculate the feature importance score of random forest,and Borda count method was used to sort and fusion,so as to give a certain explanation to the prediction results of the model.On the two public datasets of LendingClub and Paipaidai,the proposed method was compared with methods such as support vector machines,random forests,and wide and deep networks.Experiments show that the method has better performance,and the feature importance rating is consistent with people's intuitive understanding and objective understanding.

Key words: Credit risk assessment, Deep forest, Feature impertance, Per-to-per lending, Unbalanced dataset

中图分类号: 

  • TP391
[1]OHLSON J A.Financial Ratios and the Probabilistic Prediction of Bankruptcy[J].Journal of Accounting Research,1980,18(1):109-131.
[2]XIAO H M,HOU Y,CUI C N.Evaluation of P2P Lending Borrower's Credit on BP Artificial Neural Network [J].Operations Research and Management,2018,27 (9):112-118.
[3]ZHOU Z H,FENG J.Deep Forest:Towards An Alternative to Deep Neural Networks[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence.2017:3553-3559.
[4]BREIMAN L,FRIEDMAN J,OLSHEN R,et al.Classificationand Regression Trees[M].New York:Chapman & Hall,1984.
[5]LU H Y.Construction of risk evaluation system of P2P online loan platform based on SVM [J].Science and Technology Economics Market,2018(2):70-74.
[6]TAN Z M,XIE K,PENG Y P.Research on Credit Risk Evalua-
tion of P2P Online Borrowers Based on Gradient Boosting Decision Tree Model [J].Soft Science,2018,32(12):136-140.
[7]XU T T.Application of random forest in credit risk assessment of P2P online loan borrowing [D].Jinan:Shandong University,2017.
[8]MA P J,WANG Y,YU L,et al.Risk assessment of P2P net-work lending based on cost-sensitive decision tree [J].Computer Integrated Manufacturing System,2018,243 (7):296-302.
[9]ZHANG Y C,SONG X P,LUO Y.Research on Customer CreditEvaluation Based on Fuzzy Support Vector Machine [J].Statistics and Decision,2008(7):16-19.
[10]WANG C R,HAN D M,LIU Q G,et al.A Deep Learning Approach for Credit Scoring of Peer-to-Peer Lending Using Attention Mechanism LSTM[J].IEEE Access,2018(7):2161-2168.
[11]YANG Z,ZHANG Y S,GUO B H,et al.DeepCredit:Exploiting User Cickstream for Loan Risk Prediction in P2P Lending[C]//International AAAI Conference on Web and Social Media Twelfth International AAAI Conference on Web and Social Media.Palo Alto,California USA:AAAI,2018:444-443.
[12]BASTANI K,ASGARI E,NAMAVARI H.Wide and deeplearning for peer-to-peer lending[J].Expert Systems With Applications,2019,134:209-224.
[13]TONG T,LUO S L,PAN L M,Zhang Tiemei.Scale data mining method based on deep forest[J].Electronic Design Engineering,2020,28(13):88-91,96.
[14]UTKIN L V,RYABININ M A.A Siamese Deep Forest[J].Journal of Knowledge-Based Systems[J].arXiv:1704.08715vl,2017:5-6.
[15]GE S L,YE J,HE M X.Prediction model of user purchase behavior based on deep forest[J].Computer Science,2019,46(9):190-194.
[16]LU X D,DUAN Z M,QIAN Y K,et al.A Malicious Code Classification Method Based on Deep Forest[J].Journal of Software,2020,31(5):1454-1464.
[1] 蒋鹏飞, 魏松杰.
基于深度森林与CWGAN-GP的移动应用网络行为分类与评估
Classification and Evaluation of Mobile Application Network Behavior Based on Deep Forest and CWGAN-GP
计算机科学, 2020, 47(1): 287-292. https://doi.org/10.11896/jsjkx.181102118
[2] 葛绍林, 叶剑, 何明祥.
基于深度森林的用户购买行为预测模型
Prediction Model of User Purchase Behavior Based on Deep Forest
计算机科学, 2019, 46(9): 190-194. https://doi.org/10.11896/j.issn.1002-137X.2019.09.027
[3] 韩慧,王黎明,柴玉梅,刘箴.
基于强化表征学习深度森林的文本情感分类
Text Sentiment Classification Based on Deep Forests with Enhanced Features
计算机科学, 2019, 46(7): 172-179. https://doi.org/10.11896/j.issn.1002-137X.2019.07.027
[4] 杨德杰, 章宁, 袁戟, 白璐.
基于堆栈降噪自编码网络的个人信用风险评估方法
Individual Credit Risk Assessment Based on Stacked Denoising Autoencoder Networks
计算机科学, 2019, 46(10): 7-13. https://doi.org/10.11896/jsjkx.181102216
[5] 薛参观, 燕雪峰.
基于改进深度森林算法的软件缺陷预测
Software Defect Prediction Based on Improved Deep Forest Algorithm
计算机科学, 2018, 45(8): 160-165. https://doi.org/10.11896/j.issn.1002-137X.2018.08.029
[6] 刘盼,李华康,孙国梓.
基于短时多源回归算法的P2P平台风险观测方法
Risk Observing Method Based on Short-time Multi-source Regression Algorithm on P2P Platform
计算机科学, 2018, 45(5): 97-101. https://doi.org/10.11896/j.issn.1002-137X.2018.05.017
[7] 任永功,杨雪,杨荣杰,胡志冬.
基于信息增益特征关联树的文本特征选择算法
Text Feature Selection Methods Based on Information Gain and Feature Relation Tree
计算机科学, 2013, 40(10): 252-256.
[8] 任永功 杨荣杰 尹明飞 马名威.
基于信息增益的文本特征选择方法
Information-gain-based Text Feature Selection Method
计算机科学, 2012, 39(11): 127-130.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!