Computer Science ›› 2019, Vol. 46 ›› Issue (11A): 595-598.

• Interdiscipline & Application • Previous Articles     Next Articles

Comparison of Balancing Methods in Internet Finance Overdue Recognition:Taking PPDai.com As Case

LIU Hua-ling1, LIN Bei1, YUN Wen-jing1, DING Yu-jie2   

  1. (School of Statistics and Information,Shanghai University of International Business and Economics,Shanghai 201620,China)1;
    (School of Information Management and Engineering,Shanghai University of Finance and Economics,Shanghai 200433,China)2
  • Online:2019-11-10 Published:2019-11-20

Abstract: The rapid development of Internet finance makes the P2P network loan as an innovative financing method for SMEs and individuals,therefore,how to identify the potential risks becomes a hot issue.However,due to the existence of serious imbalance between the overdue and non-overdue samples,the overdue recognition rate is low.To solve this problem,the paper used random undersampling,SMOTE and Bagging to pre-process the data,and then compared the result by using Logistic Regression (LR) and Support Vector Classification Machine (SVC).The empirical results show that the balancing effect of Bagging is better than random undersampling and SMOTE in P2P overdue loan recognition.In addition,LR is more suitable for P2P overdue loan recognition than SVC for not existing obvious over-fitting.

Key words: Class imbalance, Ensemble learning, Overdue loan recognition, Resampling

CLC Number: 

  • F832.39
[1]KLAFFT M.Peer to Peer Lending:Auctioning Microcredits over the Internet[M].Social Science Electronic Publishing,2009.
[2]PURO L,TEICH J E,WALLENIUS H,et al.Borrower Deci-sion Aid for people-to-people lending[J].Decision Support Systems,2010,49(1):52-60.
[3]DUARTE J,SIEGEL S,YOUNG L.Trust and Credit:The Role of Appearance in Peer-to-peer Lending[J].Review of Financial Studies,2012,25(8):2455-2483.
[4]EMEKTER R,TU Y,JIRASAKULDECH B,et al.Evaluatingcredit risk and loan performance in online Peer-to-Peer (P2P) lending[J].Applied Economics,2015,47(1):54-70.
[5]GUO Y,ZHOU W,LUO C,et al.Instance-based credit risk assessment for investment decisions in P2P lending[J].European Journal of Operational Research,2015,249(2):417-426.
[6]柳向东,李凤.大数据背景下网络借贷的信用风险评估——以人人贷为例[J].统计与信息论坛,2016,31(5):41-48.
[7]罗钦芳,丁国维,傅馨,等.基于“多层次分类”方法的异常P2P网贷借款识别[J].管理工程学报,2017,31(3):201-209.
[8]XIA Y,LIU C,LIU N.Cost-sensitive boosted tree for loan eva-luation in peer-to-peer lending[J].Electronic Commerce Research & Applications,2017,24:30-49.
[9]HE H,GARCIA E A.Learning from Imbalanced Data[J].IEEE Transactions on Knowledge & Data Engineering,2009,21(9):1263-1284.
[10]HULSE J V,KHOSHGOFTAAR T M,NAPOLITANO A,et al.An exploration of learning when data is noisy and imba-lanced[J].Intelligent Data Analysis,2011,15(2):215-236.
[11]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
[12]BREIMAN L.Bagging predictors[J].Machine Learning,1996,24(2):123-140.
[13]刘巧莉,温浩宇,Hong Qin.P2P网络信贷中投资行为影响因素研究——基于拍拍贷平台交易的证据[J].管理评论,2017,29(6):13-22.
[14]陈冬宇,朱浩,郑海超.风险、信任和出借意愿———基于拍拍贷注册用户的实证研究[J].管理评论,2014,26(1):150-158.
[15]廖理,吉霖,张伟强.借贷市场能准确识别学历的价值吗?——来自P2P平台的经验证据[J].金融研究,2015(3):146-159.
[16]曾江洪,李文瀚,陈玺.P2P借款的损失能挽回吗?——基于拍拍贷的实证研究[J].科研管理,2016,37(8):48-57.
[17]彭红枫,杨柳明,谭小玉.地域差异如何影响P2P平台借贷的行为——基于“人人贷”的经验证据[J].当代经济科学,2016,38(5):21-34.
[18]胡晏.信用等级、借款成功率与违约风险——基于“拍拍贷”数据的经验证据[J].投资研究,2017,36(8):143-158.
[19]WEISS G M,PROVOST F.Learning when training data arecostly:the effect of class distribution on tree induction[M].AI Access Foundation,2003.
[20]魏瑾瑞,吕晓云.Logistic模型对非平衡数据的敏感性:测度、修正与比较[J].统计研究,2016,33(2):79-85.
[1] LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
[2] KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
[3] WANG Yu-fei, CHEN Wen. Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment [J]. Computer Science, 2022, 49(6): 127-133.
[4] HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin. Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning [J]. Computer Science, 2022, 49(5): 33-42.
[5] REN Shou-peng, LI Jin, WANG Jing-ru, YUE Kun. Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction [J]. Computer Science, 2022, 49(2): 265-271.
[6] CHEN Wei, LI Hang, LI Wei-hua. Ensemble Learning Method for Nucleosome Localization Prediction [J]. Computer Science, 2022, 49(2): 285-291.
[7] LIU Zhen-yu, SONG Xiao-ying. Multivariate Regression Forest for Categorical Attribute Data [J]. Computer Science, 2022, 49(1): 108-114.
[8] ZHOU Xin-min, HU Yi-gui, LIU Wen-jie, SUN Rong-jun. Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method [J]. Computer Science, 2021, 48(9): 50-58.
[9] ZHOU Gang, GUO Fu-liang. Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data [J]. Computer Science, 2021, 48(6A): 250-254.
[10] DAI Zong-ming, HU Kai, XIE Jie, GUO Ya. Ensemble Learning Algorithm Based on Intuitionistic Fuzzy Sets [J]. Computer Science, 2021, 48(6A): 270-274.
[11] HUAN Wen-ming, LIN Hai-tao. Design of Intrusion Detection System Based on Sampling Ensemble Algorithm [J]. Computer Science, 2021, 48(11A): 705-712.
[12] LIU Zhen-peng, SU Nan, QIN Yi-wen, LU Jia-huan, LI Xiao-fei. FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest [J]. Computer Science, 2020, 47(8): 185-188.
[13] FANG Meng-lin, TANG Wen-bing, HUANG Hong-yun and DING Zuo-hua. Wall-following Navigation of Mobile Robot Based on Fuzzy-based Information Decomposition and Control Rules [J]. Computer Science, 2020, 47(6A): 79-83.
[14] GU Xue-mei,LIU Jia-yong,CHENG Peng-sen,HE Xiang. Malware Name Recognition in Tweets Based on Enhanced BiLSTM-CRF Model [J]. Computer Science, 2020, 47(2): 245-250.
[15] DONG Ming-gang,JIANG Zhen-long,JING Chao. Multi-class Imbalanced Learning Algorithm Based on Hellinger Distance and SMOTE Algorithm [J]. Computer Science, 2020, 47(1): 102-109.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!