计算机科学 ›› 2016, Vol. 43 ›› Issue (6): 208-213.doi: 10.11896/j.issn.1002-137X.2016.06.042

• 人工智能 • 上一篇    下一篇

深度随机森林在离网预测中的应用

杨晓峰,严建峰,刘晓升,杨璐   

  1. 苏州大学计算机科学与技术学院 苏州215006,苏州大学计算机科学与技术学院 苏州215006;香港城市大学创意媒体学院 香港999077,苏州大学计算机科学与技术学院 苏州215006,苏州大学计算机科学与技术学院 苏州215006
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金(61373092,61033013,61272449,61202029),江苏省教育厅重大项目(12KJA520004),江苏省科技支撑计划重点项目(BE2014005),广东省重点实验室开放课题(SZU-GDPHPCL-2012-09)资助

Deep Random Forest for Churn Prediction

YANG Xiao-feng, YAN Jian-feng, LIU Xiao-sheng and YANG Lu   

  • Online:2018-12-01 Published:2018-12-01

摘要: 在电信运营商领域,离网预测模型是企业决策者用来发现潜在离网用户(即停用运营商服务)的主要手段。目前离网预测模型都是基于逻辑回归、决策树、神经网络及随机森林等浅层机器学习算法,但是在大数据的背景下,这些浅层算法在预测问题上很难取得更高的精度。因此,提出了一种新型的深层结构模型——深度随机森林,通过将传统浅层随机森林堆积成深层结构模型,获得更高的预测精度。在运营商真实数据上进行了大量实验,结果证明深层随机森林模型比传统浅层机器学习算法在离网预测问题上可以得到更好的效果。同时,增大训练数据量可以进一步提升深层随机森林的预测能力,从而证明了在大数据环境下深层模型的潜力。

关键词: 离网预测,深层随机森林

Abstract: Churn prediction models help telecom operators identify potential off-network user.Most previous models adopt shallow machine learning algorithms such as logistic regression,decision tree,random forest and neural networks.This paper proposed a novel deep random forest algorithm,which is a multi-layer random forest with layer-wise trai-ning.In terms of telecom operators’ real data,we confirmed that the proposed deep random forest performs better than previous shallow learning algorithms in churn prediction.Moreover,increasing the volume of training data can further improve the performance of deep random forest,which implies that big data make deep models advantageous over shallow models.

Key words: Churn prediction,Deep random forest

[1] Hinton G E,Osindero S.A fast learning algorithm for deep belief nets[J].Neural Computation,2006,18(7):1527-1554
[2] LeCun Y,Jackel L,Bottou L,et al.Comparison of Learning Algorithms for Handwritten Digit Recognition[C]∥I nternational Conference on Artifical Neural Networks.1995:53-60
[3] Breiman L,Schapire E.Random forests[J].Machine Learning,2001,45(1):5-32
[4] Fang Kuang-nan,Wu Jian-bin,Zhu Jian-ping,et al.A Review of Technologies on Random Forests[J].Statistics and Information Forum,2011,26(3):32-38(in Chinese) 方匡南,吴见彬,朱建平,等.随机森林方法研究综述[J].统计与信息论坛,2011,26(3):32-38
[5] Davis J,Goadrich M.The Relationship Between Precision-Recall and ROC Curves[C]∥Proceedings of the 23rd International Conference on Machine Learning(ICML).2000:233-240
[6] Neslin S,Gupta S,Kamakura W A,et al.Defection Detection:Measuring and Understanding the Predictive Accuracy of Customer Churn Models[J].Social Science Electronic Publishing,2006,43(2):204-211
[7] Hadden J,Tiwari A,Roy R,et al.Computer assisted customer churn management:State-of-the-art and future trends[J].Computers & Operations Research,2007,34(10):2902-2917
[8] Lima E.Domain knowledge integration in data mining using decision tables:case studies in churn prediction[J].Journal of the Operational Research Society,2009,60(8):1096-1106
[9] Huang Yi-qing,Zhu Fang-zhou,Yuan Ming-xuan,et al.Telcochurn prediction with big data[C]∥SIGMOD.2015:607-618
[10] Yuan Ming-xuan,Deng Ke,Zeng Jia,et al.OceanST:A distributed analytic system for large-scale spatiotemporal mobile broadband data[C]∥VLDB (Demo).2014:1561-1564
[11] Verbeke W,Martens D,Mues C,et al.Building comprehensible customer churn prediction models with advanced rule induction techniques[J].Expert Systems with Applications,2011,38(3):2354-2364
[12] Sun Zhi-jun,Xue Lei,Xu Yang-ming,et al.Overview of deeplearning[J].Application Research of Computers,2012,29(8):2806-2810(in Chinese) 孙志军,薛磊,许阳明,等.深度学习研究综述[J].计算机应用研究,2012,29(8):2806-2810
[13] S Jin-bo,L Xiu,L Wen-huang.The Application ofAdaBoost in Customer Churn Prediction[C]∥2007 International Conference on Service Systems and Service Management.IEEE,2007:1-6
[14] Lemmens A,Croux C.Bagging and boosting classification trees to predict churn[J].Journal of Marketing Research,2006,43(2):276-286
[15] Datta P,Masand B R.Mani D,et al.Automated Cellular Mode-ling and Prediction on a Large Scale[J].Artificial Intelligence Review,2000,14(6):485-502
[16] Hung S,Yen D C,Wang H.Applying data mining to telecomchurn management[J].Expert Systems with Applications,2006,31:515-524
[17] Burez J,Van den Poel D.Handling class imbalance in customer churn prediction[J].Dirk Van den Poel,2008,36(3):4626-4636
[18] Lecun Y,Bottou L,Bengio Y,et al.Gradient-based learning applied to document recognition[C]∥Proceedings of the IEEE.1998
[19] Liu Jian-wei,Liu Yuan,Luo Xiong-lin.Research and develop-ment on deep learning[J].Application Research of Computers,2014,31(7):1921-1930(in Chinese) 刘建伟,刘媛,罗雄麟.深度学习研究进展[J].计算机应用研究,2014,31(7):1921-1930
[20] Page L,Brin S,Motwani R,et al.The PageRank Citation Ran-king:Bringing Order to the Web[C]∥Stanford InfoLab.1998:1-14
[21] Zhu X,Ghahramani Z.Learning from labeled and unlabeled data with label propagation[R].Technical Report CMU-CALD-02-107,Carnegie Mellon University,2002
[22] Rendle S.Scaling factorization machines to relational data[J].PVLDB,2013,6(5):337-348
[23] Zeng J,Cheung W K,Liu J.Learning topic models by beliefpropagation[J].IEEE Trans.Pattern Anal.Mach.Intell.,2013,35(5):1121-1134
[24] Xu Xiang-yang.Application of x2 Test in Analysing Students’ Score Difference[J].Journal of Changzhou Teachers College of Technology,2001,7(4):13-16(in Chinese) 徐向阳.卡方检验在学生成绩差异性分析中的应用[J].常州技术师范学院学报,2001,7(4):13-16

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!