计算机科学 ›› 2021, Vol. 48 ›› Issue (5): 197-201.doi: 10.11896/jsjkx.200900043

• 人工智能 • 上一篇    下一篇

逾期风险预测的宽度和深度学习

宁婷, 苗德壮, 董启文, 陆雪松   

  1. 华东师范大学数据科学与工程学院 上海200062
  • 收稿日期:2020-09-07 修回日期:2020-10-01 出版日期:2021-05-15 发布日期:2021-05-09
  • 通讯作者: 陆雪松(xslu@dase.ecnu.edu.cn)
  • 基金资助:
    国家自然科学基金(U1711262,61672234)

Wide and Deep Learning for Default Risk Prediction

NING Ting, MIAO De-zhuang, DONG Qi-wen, LU Xue-song   

  1. School of Data Science and Engineering,East China Normal University,Shanghai 200062,China
  • Received:2020-09-07 Revised:2020-10-01 Online:2021-05-15 Published:2021-05-09
  • About author:NING Ting,born in 1996,postgraduate.Her main research interests include machine learning and so on.(51185100026@stu.ecnu.edu.cn)
    LU Xue-song,born in 1985,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include FinTech,computational pedagogy and natural language processing.
  • Supported by:
    National Natural Science Foundation of China(U1711262,61672234).

摘要: 逾期风险控制是信用贷款服务的关键业务环节,直接影响放贷企业的收益率和坏账率。随着移动互联网的发展,信贷类金融服务已经惠及普罗大众,逾期风控也从以往依赖规则的人工判断,转为利用大量客户数据构建的信贷模型,以预测客户的逾期概率。相关模型包括传统的机器学习模型和深度学习模型,前者可解释性强、预测能力较弱;后者预测能力强、可解释性较差,且容易发生过拟合。因此,如何融合传统机器学习模型和深度学习模型,一直是信贷数据建模的研究热点。受到推荐系统中宽度和深度学习模型的启发,信贷模型首先可以使用传统机器学习来捕捉结构化数据的特征,同时使用深度学习来捕捉非结构化数据的特征,然后合并两部分学习得到的特征,将其经过线性变换后,最后得到预测的客户的逾期概率。所提模型中和了传统机器学习模型和深度学习模型的优点。实验结果表明,其具有更强的预测客户逾期概率的能力。

关键词: 机器学习, 宽度和深度模型, 深度学习, 逾期风险预测

Abstract: Default risk control is a key business component of credit loan services,which directly affects the profitability and bad-debt rate of lenders.With the development of the mobile Internet,credit-based financial services have benefited the general public.Default risk control has changed from manual judgment based on rules to credit models built by using large amounts of customer data to predict the default rate of customers.Relevant models include traditional machine learning models and deep learning mo-dels.The former has a strong interpretability but a weak predictability;the latter has a strong predictability but a poor interpre-tability,which is prone to overfitting the training data.Therefore,the integration of traditional machine learning models and deep learning models has always been an active research area in credit modeling.Inspired by the wide & deep learning models in re-commendation systems,a credit model first can utilize traditional machine learning to capture features of the structured data,while a deep learning can capture features of the unstructured data.Then,the model combines two parts of the learned features and uses an additional linear layer to transform the hidden features.Finally,the model outputs the predicted default rate.This model neutralizes the advantages of traditional machine learning models and deep learning models.Experimental results show thatthe proposed model has a stronger capability to predict the default probability of customers.

Key words: Deep learning, Default risk control, Machine learning, Wide & deep learning models

中图分类号: 

  • TP520
[1]FU K,CHENG D W,TU Y,et al.Credit Card Fraud Detection Using Convolutional Neural Networks[C] //International Conference on Neural Information Processing.Springer,Cham,2016:483-490.
[2]CHEN Z Y.Zhulianbihe:Towards Network Credit Score Card Model based on Machine Learning [J].Wuhan Finance,2020(3):42-50.
[3]PU Z.Towards Green Credit Risk Assessment Model of Listed Companies based on RF and Ensembling SVM [D].Shanghai:Shanghai Normal University,2019.
[4]REN S P,PENG Y N.Default Risk Assessment of ConsumerCredit Based on Soft Voting Fusion Model [J].Financial Theory and Practice,2020(4):77-83.
[5]CHENG H T,KOC L,HARMSEN J,et al.Wide & Deep lear-ning for recommender systems[C] //Deep Learning for Recommender Systems.2016:7-10.
[6]YAO Z.Score Functions for Decision Tree Models [J].Journal of Management,2005(S2):166-168.
[7]WEI L,SHUAI D,HAO W,et al.Heterogeneous ensemble learning with feature engineering for default prediction in peer-to-peer lending in China[J].World Wide Web,2020,23(1):23-45.
[8]JEROME H.FRIEDMA N.Greedy Function Approximation:A Gradient Boosting Machine[J].The Annals of Statistics,2001,29(5):1189-1232.
[9]CHEN T,GUESTRIN C.XGBoost:A Scalable Tree Boosting System[C]//Proceedings of the 22nd Sigkdd International Conference on Knowledge Discovery and Data Mining.2016:785-794.
[10]LONG Z D.Towards Credit Risk Assessment of CommercialBanks based on BP Neural Network [D].Hubei:Hubei University of Technology,2018.
[11]KVAMME H,SELLEREITE N,AAS K,et al.Predicting Mortgage Default using Convolutional Neural Networks[J].Expert Systems with Applications,2018,102:207-217.
[12]WANG C,HAN D,LIU Q,et al.A Deep Learning Approach for Credit Scoring of Peer-to-peer Lending using Attention Mechanism LSTM[J].IEEE Access,2018(99):1-1.
[13]ZHENG Z,YANG Y,NIU X,et al.Wide and Deep Convolutio-nal Neural Networks for Electricity-theft Detection to Secure Smart Grids[J].IEEE Transactions on Industrial Informatics,2017,14(4):1606-1615.
[14]NIU M,CAI J.A Label Informative Wide & Deep Classifier for Patents and Papers[C] //In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:3429-3434.
[15]NGUYEN B P,PHAM H N,TRAN H,et al.Predicting the Onset of Type 2 Diabetes using Wide and Deep Learning with Electronic Health Records[J].Computer Methods and Programs in Biomedicine,2019,182:9.
[16]BASTANI K,ASGARI E,NAMAVARI H.Wide and deeplearning for peer-to-peer lending[J].Expert Systems With Applications,2019,134:209-224.
[1] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[3] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[4] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[5] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[6] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[7] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[8] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[9] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[10] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[11] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[12] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[13] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[14] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[15] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!