计算机科学 ›› 2019, Vol. 46 ›› Issue (11A): 595-598.

• 综合、交叉与应用 • 上一篇    下一篇

互联网金融风险识别中类平衡处理方法对比研究——以拍拍贷为例

刘华玲1, 林蓓1, 恽文婧1, 丁宇杰2   

  1. (上海对外经贸大学统计与信息学院 上海201620)1;
    (上海财经大学信息管理与工程学院 上海201620)2
  • 出版日期:2019-11-10 发布日期:2019-11-20
  • 作者简介:刘华玲(1964-),女,博士,教授,主要研究方向为隐私保护、互联网金融,E-mail:liuhl99@126.com。
  • 基金资助:
    本文受上海市哲学社会科学规划课题(2018BJB023),国家社会科学重大课题(16ZDA055)资助。

Comparison of Balancing Methods in Internet Finance Overdue Recognition:Taking PPDai.com As Case

LIU Hua-ling1, LIN Bei1, YUN Wen-jing1, DING Yu-jie2   

  1. (School of Statistics and Information,Shanghai University of International Business and Economics,Shanghai 201620,China)1;
    (School of Information Management and Engineering,Shanghai University of Finance and Economics,Shanghai 200433,China)2
  • Online:2019-11-10 Published:2019-11-20

摘要: 互联网金融的快速发展,使得P2P成为一种创新的金融模式,如何识别出网贷中的潜在风险成为研究热点。网贷交易数据常常存在严重的不平衡,导致风险识别率较低。针对这一问题,文中采用随机下采样、SMOTE和Bagging方法进行类平衡处理,利用逻辑回归和支持向量分类机进行检验评价。实验表明,在P2P风险识别中,以召回率为标准,bagging的平衡处理效果优于随机下采样与SMOTE,且逻辑回归不存在明显的过拟合,所以其他SVC更适合用于P2P逾期风险识别。

关键词: 集成学习, 类不平衡, 逾期识别, 重采样

Abstract: The rapid development of Internet finance makes the P2P network loan as an innovative financing method for SMEs and individuals,therefore,how to identify the potential risks becomes a hot issue.However,due to the existence of serious imbalance between the overdue and non-overdue samples,the overdue recognition rate is low.To solve this problem,the paper used random undersampling,SMOTE and Bagging to pre-process the data,and then compared the result by using Logistic Regression (LR) and Support Vector Classification Machine (SVC).The empirical results show that the balancing effect of Bagging is better than random undersampling and SMOTE in P2P overdue loan recognition.In addition,LR is more suitable for P2P overdue loan recognition than SVC for not existing obvious over-fitting.

Key words: Class imbalance, Ensemble learning, Overdue loan recognition, Resampling

中图分类号: 

  • F832.39
[1]KLAFFT M.Peer to Peer Lending:Auctioning Microcredits over the Internet[M].Social Science Electronic Publishing,2009.
[2]PURO L,TEICH J E,WALLENIUS H,et al.Borrower Deci-sion Aid for people-to-people lending[J].Decision Support Systems,2010,49(1):52-60.
[3]DUARTE J,SIEGEL S,YOUNG L.Trust and Credit:The Role of Appearance in Peer-to-peer Lending[J].Review of Financial Studies,2012,25(8):2455-2483.
[4]EMEKTER R,TU Y,JIRASAKULDECH B,et al.Evaluatingcredit risk and loan performance in online Peer-to-Peer (P2P) lending[J].Applied Economics,2015,47(1):54-70.
[5]GUO Y,ZHOU W,LUO C,et al.Instance-based credit risk assessment for investment decisions in P2P lending[J].European Journal of Operational Research,2015,249(2):417-426.
[6]柳向东,李凤.大数据背景下网络借贷的信用风险评估——以人人贷为例[J].统计与信息论坛,2016,31(5):41-48.
[7]罗钦芳,丁国维,傅馨,等.基于“多层次分类”方法的异常P2P网贷借款识别[J].管理工程学报,2017,31(3):201-209.
[8]XIA Y,LIU C,LIU N.Cost-sensitive boosted tree for loan eva-luation in peer-to-peer lending[J].Electronic Commerce Research & Applications,2017,24:30-49.
[9]HE H,GARCIA E A.Learning from Imbalanced Data[J].IEEE Transactions on Knowledge & Data Engineering,2009,21(9):1263-1284.
[10]HULSE J V,KHOSHGOFTAAR T M,NAPOLITANO A,et al.An exploration of learning when data is noisy and imba-lanced[J].Intelligent Data Analysis,2011,15(2):215-236.
[11]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
[12]BREIMAN L.Bagging predictors[J].Machine Learning,1996,24(2):123-140.
[13]刘巧莉,温浩宇,Hong Qin.P2P网络信贷中投资行为影响因素研究——基于拍拍贷平台交易的证据[J].管理评论,2017,29(6):13-22.
[14]陈冬宇,朱浩,郑海超.风险、信任和出借意愿———基于拍拍贷注册用户的实证研究[J].管理评论,2014,26(1):150-158.
[15]廖理,吉霖,张伟强.借贷市场能准确识别学历的价值吗?——来自P2P平台的经验证据[J].金融研究,2015(3):146-159.
[16]曾江洪,李文瀚,陈玺.P2P借款的损失能挽回吗?——基于拍拍贷的实证研究[J].科研管理,2016,37(8):48-57.
[17]彭红枫,杨柳明,谭小玉.地域差异如何影响P2P平台借贷的行为——基于“人人贷”的经验证据[J].当代经济科学,2016,38(5):21-34.
[18]胡晏.信用等级、借款成功率与违约风险——基于“拍拍贷”数据的经验证据[J].投资研究,2017,36(8):143-158.
[19]WEISS G M,PROVOST F.Learning when training data arecostly:the effect of class distribution on tree induction[M].AI Access Foundation,2003.
[20]魏瑾瑞,吕晓云.Logistic模型对非平衡数据的敏感性:测度、修正与比较[J].统计研究,2016,33(2):79-85.
[1] 林夕, 陈孜卓, 王中卿.
基于不平衡数据与集成学习的属性级情感分类
Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning
计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[2] 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩.
融合Bert和图卷积的深度集成学习软件需求分类
Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution
计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[3] 朱旭东, 熊贇.
基于样本分布损失的图像多标签分类研究
Study on Multi-label Image Classification Based on Sample Distribution Loss
计算机科学, 2022, 49(6): 210-216. https://doi.org/10.11896/jsjkx.210300267
[4] 王宇飞, 陈文.
基于DECORATE集成学习与置信度评估的Tri-training算法
Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment
计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[5] 韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳.
基于共同子空间分类学习的跨媒体检索研究
Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning
计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157
[6] 任首朋, 李劲, 王静茹, 岳昆.
基于集成回归决策树的lncRNA-疾病关联预测方法
Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction
计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[7] 陈伟, 李杭, 李维华.
核小体定位预测的集成学习方法
Ensemble Learning Method for Nucleosome Localization Prediction
计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195
[8] 刘振宇, 宋晓莹.
一种可用于分类型属性数据的多变量回归森林
Multivariate Regression Forest for Categorical Attribute Data
计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189
[9] 周新民, 胡宜桂, 刘文洁, 孙荣俊.
基于多模态多层级数据融合方法的城市功能识别研究
Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method
计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220
[10] 周钢, 郭福亮.
基于特征选择的高维数据集成学习方法研究
Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data
计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102
[11] 戴宗明, 胡凯, 谢捷, 郭亚.
基于直觉模糊集的集成学习算法
Ensemble Learning Algorithm Based on Intuitionistic Fuzzy Sets
计算机科学, 2021, 48(6A): 270-274. https://doi.org/10.11896/jsjkx.200700036
[12] 郇文明, 林海涛.
基于采样集成算法的入侵检测系统设计
Design of Intrusion Detection System Based on Sampling Ensemble Algorithm
计算机科学, 2021, 48(11A): 705-712. https://doi.org/10.11896/jsjkx.201100101
[13] 刘振鹏, 苏楠, 秦益文, 卢家欢, 李小菲.
FS-CRF:基于特征切分与级联随机森林的异常点检测模型
FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest
计算机科学, 2020, 47(8): 185-188. https://doi.org/10.11896/jsjkx.190600162
[14] 董明刚,姜振龙,敬超.
基于海林格距离和SMOTE的多类不平衡学习算法
Multi-class Imbalanced Learning Algorithm Based on Hellinger Distance and SMOTE Algorithm
计算机科学, 2020, 47(1): 102-109. https://doi.org/10.11896/jsjkx.190600060
[15] 钟熙, 孙祥娥.
基于Kmeans++聚类的朴素贝叶斯集成方法研究
Research on Naive Bayes Ensemble Method Based on Kmeans++ Clustering
计算机科学, 2019, 46(6A): 439-441.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!