Computer Science ›› 2016, Vol. 43 ›› Issue (7): 217-223.doi: 10.11896/j.issn.1002-137X.2016.07.039

Previous Articles     Next Articles

Big Data-driven Complaint Prediction Model

ZHOU Wen-jie, YANG Lu and YAN Jian-feng   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Because of fierce competition in telecommunication (telco) industry,it is crucial to reduce customer complaint rate and improve customer services to improve competitive advantages for telecommunication companies.Thus,accurately predicting the complaint behaviors to reduce the complaint rates becomes one of the most important tasks for telco operators.Traditional complaint prediction models only focus on classification algorithms and artificial feature selection and do not release the full power of telco big data.In this paper,we proposed a big data-driven complaint prediction model on the Hadoop/Spark platform using efficient parallel random forests.To better explore the performance of the proposed method,we performed feature engineering not only on all data from business support system (BSS) and operations support system (OSS),but also on those from the customer service records (CSR).Moreover,several useful graph-based features and second-order features between the relationship of users were designed and used to enhance the predictive performance.Experiment results based on the practical data of the telco operator in Shanghai show that using more data sources and high dimension data to train the complaint prediction models make the prediction accuracy higher than the state-of-the-art algorithms.Based on the result,we took comfort measures on target users,which can make the lo-wer complaint rate of users and bring significant business value to the operator.

Key words: Big data,Complaint prediction model,Feature engineering,Second-order feature,Graph-based features,Random forest

[1] Zhao Ye-zhen,Huang Xiao-di.Potential customer complaintspredict model based on GPRS signaling[J].Telecommunications Information:Network and Communication,2014(8):29-32(in Chinese) 赵业祯,黄晓弟.基于信令的GPRS潜在投诉客户预测模型[J].电信快报:网络与通信,2014(8):29-32
[2] Luan Yuan-yuan,Wang Zhong-ren,Xi A-dan,et al.Research on customer complaints warning model based on improved BP neural network[C]∥Conference of Chinese Institute of Communications,2010(in Chinese) 栾媛媛,王忠仁,奚阿丹,等.基于改进BP神经网络的客户投诉预警模型研究[C]∥中国通信学会学术年会.2010
[3] Long Wen-wen.Research on mobile user’s complaint behavior based on Data Mining[D].Chongqing:Chongqing University of Technology,2014(in Chinese) 龙雯雯.基于数据挖掘的移动用户投诉行为研究[D].重庆:重庆理工大学,2014
[4] Wei Hong-ming.Research on prediction model of data mining based on mobile communication customer complaints [D].Hengyang:University of South China,2009(in Chinese) 魏红明.基于数据挖掘的移动通信客户投诉预测模型研究[D].衡阳:南华大学,2009
[5] Quinlan J R.Induction on decision tree[J].Machine Learning,1986,1(1):80-108
[6] W Jun-qing.BP Neural Network and Its Improvement[J].Journal of Chongqing Institute of Technology(Natural Science Edition),2007
[7] Shimada T,Akita K.Business support system[P].US,US6868390 B1,2000
[8] Yang Chen-tao.Data mining based on Hadoop[D].Chongqing:University of Chongqing,2010(in Chinese) 杨宸铸.基于HADOOP的数据挖掘研究[D].重庆:重庆大学,2010
[9] Bhushan B,Hall J,Kurtansky P,et al.Operations Support System for End-to-End QoS Reporting and SLA Violation Monitoring in Mobile Services Environment[J].Quality of Service in the Emerging Networking Panorama,2004,6:378-387
[10] Chu Wei-yan.The design of forecasting system based on the analysis of historical complaint data [D].Beijing:Beijing University of Posts and Telecommunications,2013(in Chinese) 褚卫艳.基于投诉历史数据的分析和预测系统设计[D].北京:北京邮电大学,2013
[11] Luo Y,Wang W,Lin X.SPARK:A Keyword Search Engine on RelationalDatabases[C]∥IEEE 24th International Conference on Data Engineering,2008(ICDE 2008).2008:1552-1555
[12] Breiman L.Random forests[J].Machine Learning,2001,45(1):5-32
[13] MacKinnon D P,Williams C M L & J.Confidence Limits for the Indirect Effect:Distribution of the Product and Resampling Methods[J].Multivariate Behavioral Research,2004,39(1):99-128
[14] Zaharia M,Chowdhury M,Das T,et al.Resilient DistributedDatasets:A Fault-Tolerant Abstraction for In-Memory Cluster Computing[C]∥USENIX Symposium on Networked Systems Design and Implementation (NSDI).2012:141-146
[15] Gao Yan-jie,Chen Guan-cheng.SparkSQL:Big data processing engine based on memory[J].Programmer,2014(8):104-107(in Chinese) 高彦杰,陈冠诚.SparkSQL:基于内存的大数据处理引擎[J].程序员,2014(8):104-107
[16] Guo Lil-i,Ding Shi-fei.The research progress of Deep Learning[J].Computer Science,2015,42(5):28-33(in Chinese) 郭丽丽,丁世飞.深度学习研究进展[J].计算机科学,2015,42(5):28-33
[17] Thusoo A,Sarma J S,Jain N,et al.Hive-A Warehousing Solution Over a Map-Reduce Framework[C]∥Proceedings of the Vldb Endowment(VLDB’09).2009
[18] Dorfman R.A Formula for the Gini Coefficient[J].Review of Economics and Statistics,1979,61(1):146-149
[19] Mackey G,Sehrish S,Wang J.Improving metadata management for small files in HDFS[C]∥IEEE International Conference on Cluster Computing and Workshops,2009(CLUSTER’09).IEEE,2009:1-4
[20] Chambers D W.Key performance indicators[J].Journal of the American Dental Association (JADA),2013,144(3):242-244
[21] Simundic A.Quality indicators[J].Biochemia Medica,2008,18(3):311-319
[22] Page L,Brin S,Motwani R,et al.The PageRank Citation Ran-king:Bringing Order to the Web[J].Stanford InfoLab,1998,9(1):1-14
[23] Kang F,Jin R,Sukthankar R.Correlated Label Propagation with Application to Multi-label Learning[C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition.2006:1719-1726
[24] Rendle S.Factorization Machines[C]∥2010 IEEE 10th International Conference on Data Mining (ICDM).2010:995-1000
[25] Fan R,Chang K,Hsieh C,et al.LIBLINEAR:A Library forLarge Linear Classification[J].Journal of Machine Learning Research,2008,9(12):1871-1874
[26] Mh Z.Receiver-operating characteristic (ROC) plots:a fundamental evaluation tool in clinical medicine[J].Clinical Chemistry,1993,39(4):561-577

No related articles found!
Full text



No Suggested Reading articles found!