计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 424-428.doi: 10.11896/JsJkx.190900018

• 数据库 & 大数据 & 数据科学 • 上一篇    下一篇

电商平台用户再购物行为的预测研究

吕泽宇李纪旋陈如剑陈东明   

  1. 东北大学软件学院 沈阳 110167
  • 发布日期:2020-07-07
  • 通讯作者: 陈东明(chendm@mail.neu.edu.cn)
  • 作者简介:yuge0099@gmail.com
  • 基金资助:
    国家级大学生创新创业训练计划资助项目(201910145222);中央高校基本科研业务专项资金(N182410001)

Research on Prediction of Re-shopping Behavior of E-commerce Customers

LV Ze-yu, LI Ji-xuan, CHEN Ru-Jian and CHEN Dong-ming   

  1. Software College,Northeastern University,Shenyang 110167,China
  • Published:2020-07-07
  • About author:LV Ze-yu, born in 1998, postgraduate.His main research interests include machine learning and so on.
    CHEN Dong-ming, born in 1968, Ph.D, professor, Ph.D supervisor, is a member of China Computer Federation.His main research interests include complex networks, social network analysis, machine learning and information security.
  • Supported by:
    This work was supported by the National Training Program of Innovation and Entrepreneurship for Undergraduates (201910145222) and Fundamental Research Funds for the Central Universities(N182410001).

摘要: 电商平台上用户的购物行为研究对于电商企业来说具有重要的商业应用价值。文中针对购物者在同一电商平台上的再次消费行为的预测问题进行了研究。首先,针对用户与商家的行为和交易记录,基于特征工程方法设计了多种不同的行为预测特征,基于可视化等方法对比分析了预测特征的重要性和特点,进行了属性筛选;然后,基于提出的预测特征设计使用了多种不同算法训练预测模型。实验研究表明,多lightGBM模型的融合方法能够达到很高的再购物行为预测准确度,其AUC值能够达到0.7018,同时,基于这种方法实现的预测器只需要少数特征就能对预测结果产生很好的贡献。研究的数据来源是开源的真实大数据,研究成果具有应用和学术双重价值。

关键词: 融合模型, 特征工程, 特征可视化, 再次购物行为预测

Abstract: The study of customers’ shopping behavior is a trending research topic and has great commercial value for e-commerce companies.This paper studies the prediction of customer’s re-shopping behavior on the same e-commerce platform.Through the analysis of shopping related actions of customers and transaction records between customers and merchants,a variety of different behavior features are designed based on feature engineering principles,and the importance and characteristics of the prediction features are analyzed by using visualization approaches.Then,based on the proposed predictive features,a variety of different algorithms are used to train the prediction models.Experimental research shows that the multi-lightGBM model ensemble method can achieve high prediction accuracy,and the AUC value can reach 0.7018.Meanwhile,the predictor only needs a few features to obtain very good prediction results.The experimental data set studied in this paper is an open source big data collected in real environment,and the research conclusions have both application and academic value.

Key words: Feature engineering, Feature visualization, Model ensemble, Re-shopping behavior prediction

中图分类号: 

  • TP181
[1] JOO J.An Empirical Study on the Relationship between Customer Value and Repurchase Intention in Korean Internet Shopping Malls.Journal of Computer Information Systems,2007,48:53-62.
[2] Kaggle.Acquire-valued-shoppers-challenge.www.kaggle.com/c/acquire-valued-shoppers-challenge/.
[3] AliCloud.Repeat Buyers Prediction-Challenge the Baseline .tianchi.aliyun.com/competition/entrance/231576/introduction?spm=5176.12281949.1003.8.708d2448oQdTSf.
[4] HEATON J.An empirical analysis of feature engineering for predictive modeling.SoutheastCon,2016,2016:1-6.
[5] GUYON I,ELISSEEFF A.An introduction to variable and feature selection.The Journal of Machine Learning Research,2003,3:1157-1182.
[6] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed Representations of Words and Phrases and their Compositionality.NIPS,2013,2:3111-3119.
[7] PEDREGOSA F,VAROQUAUX G,GRAMFORT A,et al.Scikit-learn:Machine learning in Python.Journal of Machine Learning Research,2011,12:2825-2830.
[8] KOREN Y,BELL R M,VOLINSKY C.Matrix factorization techniques for recommender systems.Computer,2009,42:30-37.
[9] KANTER J M,VEERAMACHANENI K.Deep feature synthesis:Towards automating data science endeavors//2015 IEEE International Conference on Data Science and Advanced Analytics(DSAA).Paris:IEEE,2015:1-10.
[10] MOLINA L C,BELANCHE L,NEBOT A.Feature selection algorithms:A survey and experimental evaluation//2002 IEEE International Conference on Data Mining,2002.Maebashi City:IEEE,2002:306-313.
[11] KE G L,MENG Q,FINLEY T,et al.Lightgbm:A highly efficient gradient boosting decision tree//Proceedings of the 31st International Conference on Neural Information Processing Systems(NIPS’17).Long Beach:Curran Associates Inc,2017:3149-3157.
[12] HUNTER J D.Matplotlib:A 2D graphics environment.
[13] Computing in Science & Engineering,2007,9(3):90-95.
[14] ZHOU Z H.Ensemble Methods:Foundations and Algorithms.Chapman and Hall:CRC,2012.
[15] WOLPERT D H.Stacked generalization.Neural Networks,1992,5:241-259.
[16] CHEN T,GUESTRIN C.XGBoost:a scalable tree boosting system//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.California:ACM,2016:785-794.
[17] BERGSTRA J,YAMINS D,COX D D.Hyperopt:A Python library for optimizing the hyperparameters of machine learning algorithms.Computational Science & Discovery,2013,8(1):014008.
[1] 胡鹏程, 刁力力, 叶桦, 仰燕兰.
基于人工特征与深度特征的DGA域名检测算法
DGA Domains Detection Based on Artificial and Depth Features
计算机科学, 2020, 47(9): 311-317. https://doi.org/10.11896/jsjkx.191000118
[2] 古万荣, 樊纬江, 谢贤芬, 张子烨, 毛宜军, 梁早清, 林镇溪.
基于多模型优化的超声图像肿瘤自动识别
Automatic Tumor Recognition in Ultrasound Images Based on Multi-model Optimization
计算机科学, 2020, 47(6A): 260-267. https://doi.org/10.11896/JsJkx.191200011
[3] 李天培, 陈黎.
基于双注意力编码-解码器架构的视网膜血管分割
Retinal Vessel Segmentation Based on Dual Attention and Encoder-decoder Structure
计算机科学, 2020, 47(5): 166-171. https://doi.org/10.11896/jsjkx.190400062
[4] 尚骏远, 杨乐涵, 何琨.
基于特征可视化分析深度神经网络的内部表征
Analyzing Latent Representation of Deep Neural Networks Based on Feature Visualization
计算机科学, 2020, 47(5): 190-197. https://doi.org/10.11896/jsjkx.190700128
[5] 葛绍林, 叶剑, 何明祥.
基于深度森林的用户购买行为预测模型
Prediction Model of User Purchase Behavior Based on Deep Forest
计算机科学, 2019, 46(9): 190-194. https://doi.org/10.11896/j.issn.1002-137X.2019.09.027
[6] 周文杰,杨璐,严建峰.
大数据驱动的投诉预测模型
Big Data-driven Complaint Prediction Model
计算机科学, 2016, 43(7): 217-223. https://doi.org/10.11896/j.issn.1002-137X.2016.07.039
[7] 陈科文,张祖平,龙军.
多源信息融合关键问题、研究进展与新动向
Multisource Information Fusion:Key Issues,Research Progress and New Trends
计算机科学, 2013, 40(8): 6-13.
[8] 王学光.
基于动态网络影响扩散问题研究
Research on Influence Maximization Problem Based on Dynamic Networks
计算机科学, 2012, 39(6): 111-115.
[9] .
无线传感器网络数据融合模型研究

计算机科学, 2006, 33(6): 58-60.
[10] 魏守智 赵海 王刚 张晓丹.
复杂信息系统分布式决策融合模型及应用研究

计算机科学, 2005, 32(4): 22-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!