计算机科学 ›› 2019, Vol. 46 ›› Issue (10): 7-13.doi: 10.11896/jsjkx.181102216

• 大数据与数据科学* • 上一篇    下一篇

基于堆栈降噪自编码网络的个人信用风险评估方法

杨德杰1, 章宁1, 袁戟2, 白璐1   

  1. (中央财经大学信息学院 北京100081)1
    (德国慕尼黑工业大学土木-地质-环境学院 慕尼黑80333)2
  • 收稿日期:2018-11-29 修回日期:2019-04-15 出版日期:2019-10-15 发布日期:2019-10-21
  • 通讯作者: 章宁(1975-),女,博士,教授,主要研究方向为金融科技、个人信息保护,E-mail:zhangning@cufe.edu.cn。
  • 作者简介:杨德杰(1987-),男,博士生,高级工程师,主要研究方向为机器学习、金融风控,E-mail:yangdejiejay@163.com;袁戟(1985-),男,博士,助教,高级工程师,主要研究方向为贝叶斯反演分析、随机有限元方法等;白璐(1987-),男,博士,副教授,CCF会员,主要研究方向为机器学习、特征选择等。
  • 基金资助:
    本文受国家重点研发计划(2017YFB1400701),国家社会科学基金重点项目资助(13AXW010)资助。

Individual Credit Risk Assessment Based on Stacked Denoising Autoencoder Networks

YANG De-jie1, ZHANG Ning1, YUAN Ji2, BAI Lu1   

  1. (School of Information,Central University of Finance and Economics,Beijing 100081,China)1
    (College of Civil,Geo and Environmental Engineering,Technical University of Munich,Munich 80333,Germany)2
  • Received:2018-11-29 Revised:2019-04-15 Online:2019-10-15 Published:2019-10-21

摘要: 个人信用历来是银行衡量个人履约风险最重要的因素。近年来,随着我国借贷需求与日俱增,仅依据信用卡信息的传统个人信用评估方式,已不能完全满足银行业的发展需求。因此,为了构建更加丰富的用户信用画像,文中基于银行大数据提取信用风险评估特征。为了解决金融大数据带来的维度灾难和噪声问题,充分考虑了数据特征之间的相关性,对堆栈降噪自编码神经网络模型进行了改进,引入了截断的Karhunen-Loève展开作为噪声传入项,并在某商业银行的大数据平台上进行了一系列数据实验。实验结果显示:相比仅使用信用卡信息,利用银行大数据能使衡量正负样本分离度的指标——K-S值提升约11%;改进的堆栈降噪自编码神经网络方法具有更好的风险评估效果,准确率相比原模型提高了3%左右,验证了在银行大数据环境下进行信用风险评估的有效性。

关键词: 信用风险评估, 大数据, 维度灾难, 特征选择, 堆栈降噪, 深度学习

Abstract: Personal credit is the most important factor for banks to measure individual compliance risk.In recent years,with the increasing demand for borrowing in China,the traditional way of making credit evaluation,which is merely based on credit card transaction information,cannot fully meet the development needs of the banking industry.Therefore,this paper proposed to use the big data of personal consumption in bank as the important feature information to construct a richer user image.In order to overcome the dimensional curse and noise caused by the financial big data,a modified deep learning evaluation algorithm based on stacked denoising autoencoder neural network is proposed by considering the correlation of feature data and the truncated Karhunen-Loève expansion is applied as the noise input term,then a series of related data experiments are conducted on big data platform of a commercial bank.The experimental results show that,compared with the risk evaluation just based on credit card transaction information,the K-S value that measure the positive and negative sample resolution based on big data of bank improves 11%;the improved stack denoising autoencoder neural network method has better risk assessment results and the accuracy rate is increased by about 3% compared with the original model,thus validating the effectiveness of credit risk assessment in the big data environment of bank.

Key words: Credit risk assessment, Big data, Dimensional curse, Feature selection, Stacked denoising, Deep learning

中图分类号: 

  • TP181
[1]LESSMANN S,BAESENS B,SEOW H V,et al.Benchmarking State-of-theart Classification Algorithms for Credit Scoring:An Update of Research[J].European Journal of Operational Research,2015,247(1):124-136.
[2]VISHWAKARMA A C,SOLANKI R.Analysing Credit Risk using Statistical and Machine Learning Techniques[J].International Journal of Engineering Science and Computing,2018,8(6):18397-18404.
[3]JAYANTHI J,JOSEPH KS,VAISHNAVI J.Bankruptcy Prediction using SVM and Hybrid SVM Survey [J].International Journal of Computer Application,2011,33(7):39-45.
[4]FANG K N,ZHANG G J,ZHANG H Y.Individual Credit Risk Prediction Method:Application of a Lasso-logistic Model [J].The Journal of Quantitative & Technical Economics,2014,31(2):125-136.(in Chinese)
方匡南,章贵军,张慧颖.基于Lasso-logistic模型的个人信用风险预警方法[J].数量经济技术经济研究,2014,31(2):125-136.
[5]LIN W Y,HU Y H,TSAI C F.Machine Learning in Financial Crisis Prediction:A Survey[J].IEEE Transactions on Systems Man & Cybernetics Part C,2012,42(4):421-436.
[6]CHEN M Y,CHEN C C,LIU J Y.Credit Rating Analysis with Support Vector Machines and Artificial Bee Colony Algorithm[C]//Recent Trends in Applied Artificial Intelligence.Amsterdam:Springer,2013:528-534.
[7]HEATON J B,POLSON N G,WITTE J H.Deep Learning in Finance[J].Applied Stochastic Models in Business and Industry,2017,33(1):561-580.
[8]YU L,YANG Z B,TANG L.A Novel Multistage Deep Belief Network Based Extreme Learning Machine Ensemble Learning Paradigm for Credit Risk Assessment[J].Flexible Services & Manufacturing Journal,2016,28(4):576-592.
[9]SIRIGNANO J,SADHWANI A,GIESECKE K.Deep Learning for Mortgage Risk[J].Social Science Electronic Publishing,2017,22(6):134-216.
[10]SHIGEYUKI H,MINAMI K,TAKAHIRO K,et al.Ensemble Learning or Deep Learning? Application to Default Risk Analysis[J].Risk and Financial Management,2018,11(1):12-25.
[11]MA S L,WUNIRI Q G,LI X P.Deep Learning With Big Data:State of The Art and Development [J].CAAI Transactions on Intelligent Systems,2016,11(6):728-742.(in Chinese)
马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,11(6):728-742.
[12]LIU X H,DING W.Big Data Credit Reporting Practices of ZestFinance in The United States[J].Credit Reference,2015,22(8):27-32.(in Chinese)
刘新海,丁伟.美国ZestFinance公司大数据征信实践 [J].征信,2015,22(8):27-32.
[13]LECUN Y,BENGIO Y,HINTON G.Deep Learning [J].Nature,2015,521(7553):436-444.
[14]CUI L X,BAI L,HANCOCK E R,et al.Identifying the most informative features using a structurally interacting elastic net[J].Neurocomputing,2018,313(11):65-77.
[15]ADDO P M,GUEGAN D,HASSANI B.Credit Risk Analysis Machine and Deep Learning Models[J].Risks,2018,6(2):38-57.
[16]HINTON G E,SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507.
[17]VINCENT P,LAROCHELLE H,LAJOIE I,et al.Stacked Denosing Autoencoders:Learning Useful Representations in a Deep Network with aLocal Denoising Criterion [J].Journal Machine Learning Research,2010,27(11):3371-3408.
[18]SAGHA H,CUMMINS N,SCHULLER B.Stacked Denoising Autoencoders for Sentiment Analysis:A review[J].Data Mining and Knowledge Discovery,2017,7(5):132-146.
[19]ALHASSAN Z,MCGOUGH A,ALSHAMMARI R,et al. Stacked Denoising Autoencoders for Mortality Risk Prediction Using Imbalanced Clinical Data[C]//IEEE International Conference on Machine Learning and Applications.Orlando:IEEE Press,2018:396-401.
[20]VANMARCKE E H.Random Fields:Analysis and Synthesis [M].Cambridge:MIT Press,1983:92-101.
[21]YUAN J.Time-dependent Probabilistic Assessment of Rainfall-induced Slope Failure[D].Munich:Technical University of Munich,2016.
[22]BETZ W,PAPAIOANNOU I,STRAUB D.Numerical Methods for the Discretization of Random Fields by Means of the Karhunen-Loève Expansion[J].Computer Methods in Applied Mechanics and Engineering,2014,271(0):109-129.
[1] 叶雅珍, 刘国华, 朱扬勇. 数据产品流通的两阶段授权模式[J]. 计算机科学, 2021, 48(1): 119-124.
[2] 王瑞平, 贾真, 刘畅, 陈泽威, 李天瑞. 基于DeepFM的深度兴趣因子分解机网络[J]. 计算机科学, 2021, 48(1): 226-232.
[3] 于文家, 丁世飞. 基于自注意力机制的条件生成对抗网络[J]. 计算机科学, 2021, 48(1): 241-246.
[4] 仝鑫, 王斌君, 王润正, 潘孝勤. 面向自然语言处理的深度学习对抗样本综述[J]. 计算机科学, 2021, 48(1): 258-267.
[5] 丁钰, 魏浩, 潘志松, 刘鑫. 网络表示学习算法综述[J]. 计算机科学, 2020, 47(9): 52-59.
[6] 赵会群, 吴凯锋. 一种大数据估价算法[J]. 计算机科学, 2020, 47(9): 110-116.
[7] 马梦宇, 吴烨, 陈荦, 伍江江, 李军, 景宁. 显示导向型的大规模地理矢量实时可视化技术[J]. 计算机科学, 2020, 47(9): 117-122.
[8] 何鑫, 许娟, 金莹莹. 行为关联网络:完整的变化行为建模[J]. 计算机科学, 2020, 47(9): 123-128.
[9] 叶亚男, 迟静, 于志平, 战玉丽, 张彩明. 基于改进CycleGan模型和区域分割的表情动画合成[J]. 计算机科学, 2020, 47(9): 142-149.
[10] 邓良, 许庚林, 李梦杰, 陈章进. 基于深度学习与多哈希相似度加权实现快速人脸识别[J]. 计算机科学, 2020, 47(9): 163-168.
[11] 暴雨轩, 芦天亮, 杜彦辉. 深度伪造视频检测技术综述[J]. 计算机科学, 2020, 47(9): 283-292.
[12] 董明刚, 黄宇扬, 敬超. 基于遗传实例和特征选择的K近邻训练集优化方法[J]. 计算机科学, 2020, 47(8): 178-184.
[13] 朝乐门. 数据科学导论的课程设计及教学改革[J]. 计算机科学, 2020, 47(7): 1-7.
[14] 袁野, 和晓歌, 朱定坤, 王富利, 谢浩然, 汪俊, 魏明强, 郭延文. 视觉图像显著性检测综述[J]. 计算机科学, 2020, 47(7): 84-91.
[15] 王文刀, 王润泽, 魏鑫磊, 漆云亮, 马义德. 基于堆叠式双向LSTM的心电图自动识别算法[J]. 计算机科学, 2020, 47(7): 118-124.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 杨羽琦,章国安,金喜龙. 车载自组织网络中基于车辆密度的双簇头路由协议[J]. 计算机科学, 2018, 45(4): 126 -130 .