计算机科学 ›› 2016, Vol. 43 ›› Issue (7): 217-223.doi: 10.11896/j.issn.1002-137X.2016.07.039

• 人工智能 • 上一篇    下一篇

大数据驱动的投诉预测模型

周文杰,杨璐,严建峰   

  1. 苏州大学计算机科学与技术学院 苏州215006,苏州大学计算机科学与技术学院 苏州215006;香港城市大学创意媒体学院 香港999077,苏州大学计算机科学与技术学院 苏州215006;香港城市大学创意媒体学院 香港999077
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金(61373092,61033013,61272449,61202029),江苏省教育厅重大项目(12KJA520004),江苏省科技支撑计划重点项目(BE2014005)资助

Big Data-driven Complaint Prediction Model

ZHOU Wen-jie, YANG Lu and YAN Jian-feng   

  • Online:2018-12-01 Published:2018-12-01

摘要: 随着电信行业市场竞争的不断加剧,用户对服务质量要求逐步提高,导致用户投诉率不断攀升。在此情况下,通过准确预测用户投诉行为来降低用户投诉率成为运营商关注的重点。目前传统的投诉预测模型仅从分类算法和人工调研特征来讨论,而没有充分利用运营商的大数据。因此,提出了在Hadoop/Spark大数据平台上使用并行随机森林来构建用户预测投诉模型,它不仅用到了业务支持系统数据,而且还用到了运营支持系统数据和客服工单数据,并在此基础上进一步增加了反映用户相互关系的图特征和二阶特征。基于上海市某运营商数据的实验结果表明,利用多来源、高维度的特征来训练用户投诉预测模型的精度会明显高于传统方法,在此基础上有针对性地对目标用户采取安抚措施,可以降低用户投诉率,获得较高的商业价值。

关键词: 大数据,投诉预测模型,特征工程,二阶特征,图特征,随机森林

Abstract: Because of fierce competition in telecommunication (telco) industry,it is crucial to reduce customer complaint rate and improve customer services to improve competitive advantages for telecommunication companies.Thus,accurately predicting the complaint behaviors to reduce the complaint rates becomes one of the most important tasks for telco operators.Traditional complaint prediction models only focus on classification algorithms and artificial feature selection and do not release the full power of telco big data.In this paper,we proposed a big data-driven complaint prediction model on the Hadoop/Spark platform using efficient parallel random forests.To better explore the performance of the proposed method,we performed feature engineering not only on all data from business support system (BSS) and operations support system (OSS),but also on those from the customer service records (CSR).Moreover,several useful graph-based features and second-order features between the relationship of users were designed and used to enhance the predictive performance.Experiment results based on the practical data of the telco operator in Shanghai show that using more data sources and high dimension data to train the complaint prediction models make the prediction accuracy higher than the state-of-the-art algorithms.Based on the result,we took comfort measures on target users,which can make the lo-wer complaint rate of users and bring significant business value to the operator.

Key words: Big data,Complaint prediction model,Feature engineering,Second-order feature,Graph-based features,Random forest

[1] Zhao Ye-zhen,Huang Xiao-di.Potential customer complaintspredict model based on GPRS signaling[J].Telecommunications Information:Network and Communication,2014(8):29-32(in Chinese) 赵业祯,黄晓弟.基于信令的GPRS潜在投诉客户预测模型[J].电信快报:网络与通信,2014(8):29-32
[2] Luan Yuan-yuan,Wang Zhong-ren,Xi A-dan,et al.Research on customer complaints warning model based on improved BP neural network[C]∥Conference of Chinese Institute of Communications,2010(in Chinese) 栾媛媛,王忠仁,奚阿丹,等.基于改进BP神经网络的客户投诉预警模型研究[C]∥中国通信学会学术年会.2010
[3] Long Wen-wen.Research on mobile user’s complaint behavior based on Data Mining[D].Chongqing:Chongqing University of Technology,2014(in Chinese) 龙雯雯.基于数据挖掘的移动用户投诉行为研究[D].重庆:重庆理工大学,2014
[4] Wei Hong-ming.Research on prediction model of data mining based on mobile communication customer complaints [D].Hengyang:University of South China,2009(in Chinese) 魏红明.基于数据挖掘的移动通信客户投诉预测模型研究[D].衡阳:南华大学,2009
[5] Quinlan J R.Induction on decision tree[J].Machine Learning,1986,1(1):80-108
[6] W Jun-qing.BP Neural Network and Its Improvement[J].Journal of Chongqing Institute of Technology(Natural Science Edition),2007
[7] Shimada T,Akita K.Business support system[P].US,US6868390 B1,2000
[8] Yang Chen-tao.Data mining based on Hadoop[D].Chongqing:University of Chongqing,2010(in Chinese) 杨宸铸.基于HADOOP的数据挖掘研究[D].重庆:重庆大学,2010
[9] Bhushan B,Hall J,Kurtansky P,et al.Operations Support System for End-to-End QoS Reporting and SLA Violation Monitoring in Mobile Services Environment[J].Quality of Service in the Emerging Networking Panorama,2004,6:378-387
[10] Chu Wei-yan.The design of forecasting system based on the analysis of historical complaint data [D].Beijing:Beijing University of Posts and Telecommunications,2013(in Chinese) 褚卫艳.基于投诉历史数据的分析和预测系统设计[D].北京:北京邮电大学,2013
[11] Luo Y,Wang W,Lin X.SPARK:A Keyword Search Engine on RelationalDatabases[C]∥IEEE 24th International Conference on Data Engineering,2008(ICDE 2008).2008:1552-1555
[12] Breiman L.Random forests[J].Machine Learning,2001,45(1):5-32
[13] MacKinnon D P,Williams C M L & J.Confidence Limits for the Indirect Effect:Distribution of the Product and Resampling Methods[J].Multivariate Behavioral Research,2004,39(1):99-128
[14] Zaharia M,Chowdhury M,Das T,et al.Resilient DistributedDatasets:A Fault-Tolerant Abstraction for In-Memory Cluster Computing[C]∥USENIX Symposium on Networked Systems Design and Implementation (NSDI).2012:141-146
[15] Gao Yan-jie,Chen Guan-cheng.SparkSQL:Big data processing engine based on memory[J].Programmer,2014(8):104-107(in Chinese) 高彦杰,陈冠诚.SparkSQL:基于内存的大数据处理引擎[J].程序员,2014(8):104-107
[16] Guo Lil-i,Ding Shi-fei.The research progress of Deep Learning[J].Computer Science,2015,42(5):28-33(in Chinese) 郭丽丽,丁世飞.深度学习研究进展[J].计算机科学,2015,42(5):28-33
[17] Thusoo A,Sarma J S,Jain N,et al.Hive-A Warehousing Solution Over a Map-Reduce Framework[C]∥Proceedings of the Vldb Endowment(VLDB’09).2009
[18] Dorfman R.A Formula for the Gini Coefficient[J].Review of Economics and Statistics,1979,61(1):146-149
[19] Mackey G,Sehrish S,Wang J.Improving metadata management for small files in HDFS[C]∥IEEE International Conference on Cluster Computing and Workshops,2009(CLUSTER’09).IEEE,2009:1-4
[20] Chambers D W.Key performance indicators[J].Journal of the American Dental Association (JADA),2013,144(3):242-244
[21] Simundic A.Quality indicators[J].Biochemia Medica,2008,18(3):311-319
[22] Page L,Brin S,Motwani R,et al.The PageRank Citation Ran-king:Bringing Order to the Web[J].Stanford InfoLab,1998,9(1):1-14
[23] Kang F,Jin R,Sukthankar R.Correlated Label Propagation with Application to Multi-label Learning[C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition.2006:1719-1726
[24] Rendle S.Factorization Machines[C]∥2010 IEEE 10th International Conference on Data Mining (ICDM).2010:995-1000
[25] Fan R,Chang K,Hsieh C,et al.LIBLINEAR:A Library forLarge Linear Classification[J].Journal of Machine Learning Research,2008,9(12):1871-1874
[26] Mh Z.Receiver-operating characteristic (ROC) plots:a fundamental evaluation tool in clinical medicine[J].Clinical Chemistry,1993,39(4):561-577

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!