计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 424-428.doi: 10.11896/JsJkx.190900018
吕泽宇李纪旋陈如剑陈东明
LV Ze-yu, LI Ji-xuan, CHEN Ru-Jian and CHEN Dong-ming
摘要: 电商平台上用户的购物行为研究对于电商企业来说具有重要的商业应用价值。文中针对购物者在同一电商平台上的再次消费行为的预测问题进行了研究。首先,针对用户与商家的行为和交易记录,基于特征工程方法设计了多种不同的行为预测特征,基于可视化等方法对比分析了预测特征的重要性和特点,进行了属性筛选;然后,基于提出的预测特征设计使用了多种不同算法训练预测模型。实验研究表明,多lightGBM模型的融合方法能够达到很高的再购物行为预测准确度,其AUC值能够达到0.7018,同时,基于这种方法实现的预测器只需要少数特征就能对预测结果产生很好的贡献。研究的数据来源是开源的真实大数据,研究成果具有应用和学术双重价值。
中图分类号:
[1] JOO J.An Empirical Study on the Relationship between Customer Value and Repurchase Intention in Korean Internet Shopping Malls.Journal of Computer Information Systems,2007,48:53-62. [2] Kaggle.Acquire-valued-shoppers-challenge.www.kaggle.com/c/acquire-valued-shoppers-challenge/. [3] AliCloud.Repeat Buyers Prediction-Challenge the Baseline .tianchi.aliyun.com/competition/entrance/231576/introduction?spm=5176.12281949.1003.8.708d2448oQdTSf. [4] HEATON J.An empirical analysis of feature engineering for predictive modeling.SoutheastCon,2016,2016:1-6. [5] GUYON I,ELISSEEFF A.An introduction to variable and feature selection.The Journal of Machine Learning Research,2003,3:1157-1182. [6] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed Representations of Words and Phrases and their Compositionality.NIPS,2013,2:3111-3119. [7] PEDREGOSA F,VAROQUAUX G,GRAMFORT A,et al.Scikit-learn:Machine learning in Python.Journal of Machine Learning Research,2011,12:2825-2830. [8] KOREN Y,BELL R M,VOLINSKY C.Matrix factorization techniques for recommender systems.Computer,2009,42:30-37. [9] KANTER J M,VEERAMACHANENI K.Deep feature synthesis:Towards automating data science endeavors//2015 IEEE International Conference on Data Science and Advanced Analytics(DSAA).Paris:IEEE,2015:1-10. [10] MOLINA L C,BELANCHE L,NEBOT A.Feature selection algorithms:A survey and experimental evaluation//2002 IEEE International Conference on Data Mining,2002.Maebashi City:IEEE,2002:306-313. [11] KE G L,MENG Q,FINLEY T,et al.Lightgbm:A highly efficient gradient boosting decision tree//Proceedings of the 31st International Conference on Neural Information Processing Systems(NIPS’17).Long Beach:Curran Associates Inc,2017:3149-3157. [12] HUNTER J D.Matplotlib:A 2D graphics environment. [13] Computing in Science & Engineering,2007,9(3):90-95. [14] ZHOU Z H.Ensemble Methods:Foundations and Algorithms.Chapman and Hall:CRC,2012. [15] WOLPERT D H.Stacked generalization.Neural Networks,1992,5:241-259. [16] CHEN T,GUESTRIN C.XGBoost:a scalable tree boosting system//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.California:ACM,2016:785-794. [17] BERGSTRA J,YAMINS D,COX D D.Hyperopt:A Python library for optimizing the hyperparameters of machine learning algorithms.Computational Science & Discovery,2013,8(1):014008. |
[1] | 胡鹏程, 刁力力, 叶桦, 仰燕兰. 基于人工特征与深度特征的DGA域名检测算法 DGA Domains Detection Based on Artificial and Depth Features 计算机科学, 2020, 47(9): 311-317. https://doi.org/10.11896/jsjkx.191000118 |
[2] | 古万荣, 樊纬江, 谢贤芬, 张子烨, 毛宜军, 梁早清, 林镇溪. 基于多模型优化的超声图像肿瘤自动识别 Automatic Tumor Recognition in Ultrasound Images Based on Multi-model Optimization 计算机科学, 2020, 47(6A): 260-267. https://doi.org/10.11896/JsJkx.191200011 |
[3] | 李天培, 陈黎. 基于双注意力编码-解码器架构的视网膜血管分割 Retinal Vessel Segmentation Based on Dual Attention and Encoder-decoder Structure 计算机科学, 2020, 47(5): 166-171. https://doi.org/10.11896/jsjkx.190400062 |
[4] | 尚骏远, 杨乐涵, 何琨. 基于特征可视化分析深度神经网络的内部表征 Analyzing Latent Representation of Deep Neural Networks Based on Feature Visualization 计算机科学, 2020, 47(5): 190-197. https://doi.org/10.11896/jsjkx.190700128 |
[5] | 葛绍林, 叶剑, 何明祥. 基于深度森林的用户购买行为预测模型 Prediction Model of User Purchase Behavior Based on Deep Forest 计算机科学, 2019, 46(9): 190-194. https://doi.org/10.11896/j.issn.1002-137X.2019.09.027 |
[6] | 周文杰,杨璐,严建峰. 大数据驱动的投诉预测模型 Big Data-driven Complaint Prediction Model 计算机科学, 2016, 43(7): 217-223. https://doi.org/10.11896/j.issn.1002-137X.2016.07.039 |
[7] | 陈科文,张祖平,龙军. 多源信息融合关键问题、研究进展与新动向 Multisource Information Fusion:Key Issues,Research Progress and New Trends 计算机科学, 2013, 40(8): 6-13. |
[8] | 王学光. 基于动态网络影响扩散问题研究 Research on Influence Maximization Problem Based on Dynamic Networks 计算机科学, 2012, 39(6): 111-115. |
[9] | . 无线传感器网络数据融合模型研究 计算机科学, 2006, 33(6): 58-60. |
[10] | 魏守智 赵海 王刚 张晓丹. 复杂信息系统分布式决策融合模型及应用研究 计算机科学, 2005, 32(4): 22-23. |
|