计算机科学 ›› 2018, Vol. 45 ›› Issue (1): 179-182.doi: 10.11896/j.issn.1002-137X.2018.01.031

• 第十六届中国机器学习会议 • 上一篇    下一篇

一种用于构建用户画像的多视角融合框架

费鹏,林鸿飞,杨亮,徐博,古丽孜热·艾尼外   

  1. 大连理工大学计算机科学与技术学院信息检索研究室 辽宁 大连116024,大连理工大学计算机科学与技术学院信息检索研究室 辽宁 大连116024,大连理工大学计算机科学与技术学院信息检索研究室 辽宁 大连116024,大连理工大学计算机科学与技术学院信息检索研究室 辽宁 大连116024,伊犁师范学院电子与信息工程学院 新疆 伊宁835000
  • 出版日期:2018-01-15 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金(61632011,2,61562080,9)资助

Multi-view Ensemble Framework for Constructing User Profile

FEI Peng, LIN Hong-fei, YANG Liang, XU Bo and Gulziya ANIWAR   

  • Online:2018-01-15 Published:2018-11-13

摘要: 电网公司的电费敏感客户往往对由用电引发的电量、电价、电费、缴费、欠费等电力服务具有强烈反应。快速定位电费敏感客户,对降低客户投诉率、提升客户满意度、树立供电企业良好的服务形象具有重要的作用。基于电网用户数据,提出了一种用于构建用户画像的多视角融合框架,该框架能够快速、准确地识别出电费敏感客户。首先,对电网用户进行了分析研究,利用双通道对不同特性的用户分别建模预测;其次,提出了多种特征萃取方法,用于构建用户多源特征体系;最后,为了充分利用多源特征,进一步提出了基于双层Xgboost的多视角融合模型。该框架在2016CCF大数据与计算智能大赛“客户画像”竞赛中获得了F1值为0.90379(第一名)的成绩,其有效性得到了验证。

关键词: 用户画像,多视角学习,模型融合

Abstract: The State Grid users who are sensitive to electric charge often have a strong reaction on electric quantity,electric price,electric charge,payment,arrearage and other electrical service caused by electricity consumption.How to rapidly locate the electric-charge-users plays an important role in reducing customer complaints rate,enhancing customer satisfaction,and establishing a good service image of power supply enterprise.Based on the data of grid users,this paper presented a multi-view ensemble framework for constructing user profile,which can quickly and accurately identify the electric-charge-users.First of all,this paper analyzed the grid users and used two channels to model the users with different characteristics respectivelty.Secondly,this paper presented a variety of feature extraction methods for constructing user multi-source feature systems.Finally,in order to make full use of multi-source features,this paper proposed a multi-view ensemble model based on double Xgboost.This framework was used to obtain the F1 score of 0.90379(The first place) in the “User Profile” contest of 2016 CCF Big Data and Computational Intelligence Contest,validating the effectiveness of the method.

Key words: User profile,Multi-view learning,Model ensemble

[1] ROSENTHAL S,MCKEOWN K.Age Prediction in Blogs:AStudy of Style,Content,and Online Behavior in Pre-and Post-Social Media Generations[C]∥The 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies.Portland,Oregon,USA,2011:763-772.
[2] MUELLER J,STUMME G.Gender Inference using Statistical Name Characteristics in Twitter.https:/arxiv.org/pdf/1606.05467v2.pdf.
[3] MARQUARDT J,FARNADI G,VASUDEVAN G,et al.Age and Gender Identification in Social Media[C]∥Proceedings of CLEF 2014 Evaluation Labs.2014:1129-1136.
[4] WU L,GE Y,LIU Q,et al.Modeling users’ preferences and social links in Social Networking Services:a joint-evolving perspective[C]∥Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.2016:279-286.
[5] MA H,CAO H,YANG Q,et al.A habit mining approach for discovering similar mobile users[C]∥Proceedings of the 21st International Conference on World Wide Web.ACM,2012:231-240.
[6] ZHU H,CHEN E,XIONG H,et al.Mining mobile user prefe-rences for personalized context-aware recommendation[J].ACM Transactions on Intelligent Systems and Technology,2015,5(4):1-27.
[7] ZHANG K.Mobile Phone User Profile in Large Data Platform[J].Information and Communications,2014(2):266-267.(in Chinese) 张慷.手机用户画像在大数据平台的实现方案[J].信息通信,2014(2):266-267.
[8] HUANG W B,XU S C,WU J H,et al.The Profile Constructionof the Mobile User[J].Journal of Modern Information,2016,36(10):54-61.(in Chinese) 黄文彬,徐山川,吴家辉,等.移动用户画像构建研究[J].现代情报,2016,36(10):54-61.
[9] MA L,TAO L T,XIE J K.Customer demands managementbased on tho customer portraits[J].Power Demand Side Mana-gement,2016(A01):98-100.(in Chinese) 马亮,陶利涛,谢骏凯.基于客户画像的客户诉求管理[J].电力需求侧管理,2016 (A01):98-100.
[10] YAN Y P,WU G C.Customer Outage Sensitivity based on the Technology of Data Mining Research and Application[J].New Technology and New Process,2015(9):89-93.(in Chinese) 严宇平,吴广财.基于数据挖掘技术的客户停电敏感度研究与应用[J].新技术新工艺,2015(9):89-93.
[11] CHEN T,GUESTRIN C.Xgboost:A scalable tree boosting system[C]∥Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2016:785-794.
[12] DIERCKX,GOEDELE.Logistic Regression Model[J].Encyclopedia of Actuarial Science,2009,39(2):261-291.
[13] HEARST M A,DUMAIS S T,OSMAN E,et al.Support vector machines[J].IEEE Intelligent Systems,1998,13(4):18-28.
[14] QUINLAN J R.C4.5:programs for machine learning[M].Elsevier,2014.
[15] BREIMAN L.Random forests[J].Machine learning,2001,45(1):5-32.
[16] FRIEDMAN J H.Greedy Function Approximation:A Gradient Boosting Machine[J].Annals of Statistics,2000,29(5):1189-1232.
[17] BREIMAN L.Stacked regressions[J].Machine Learning,1996,24(1):49-64.
[18] BREIMAN L.Bagging predictors[J].Machine Learning,1996,24(2):123-140.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!