计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220300084-7.doi: 10.11896/jsjkx.220300084

• 大数据&数据科学 • 上一篇    下一篇

基于决策树改进深度交叉网络的推荐模型

柯海萍, 毛宜军, 古万荣   

  1. 华南农业大学数学与信息学院 广州 510642
  • 出版日期:2023-06-10 发布日期:2023-06-12
  • 通讯作者: 古万荣(guwanrong@scau.edu.cn)
  • 作者简介:(kehaiping@stu.scau.edu.cn)
  • 基金资助:
    全国统计科学研究项目(2020LY018);广东省哲学社会科学规划项目(GD19CGL34);中山大学广东省计算科学重点实验室开放基金项目(2021010);广东省自然科学基金面上项目(2022A1515011489);广州市智慧农业重点实验室项目(201902010081)

Recommendation Model Based on Decision Tree and Improved Deep & Cross Network

KE Haiping, MAO Yijun, GU Wanrong   

  1. College of Mathematics and Informatics,South China Agricultural University,Guangzhou 510642,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:KE Haiping,born in 1997,postgra-duate,is a student member of China Computer Federation.Her main research interests include recommendation system and deep learning. GU Wanrong,born in 1982,Ph.D,master.His main research interests include search engine,Internet big data analysis and mining, recommendation system and biological information mining.
  • Supported by:
    National Statistical Science Research Project of China(2020LY018),Philosophy and Social Science Planning Project of Guangdong Province(GD19CGL34),Open Fund Project of Guangdong Key Laboratory of Computing Science,Sun Yat-sen University(2021010),General Program of Guangdong Natural Science Foundation(2022A1515011489) and Guangzhou Key Laboratory of Intelligent Agriculture Project(201902010081).

摘要: 特征挖掘是推荐算法模型中学习用户与物品之间交互行为的关键步骤,对提升推荐模型的准确度具有重要意义。现有的特征挖掘模型中,线性逻辑回归模型虽然简便,能够达到很好的拟合效果,但其泛化能力较弱,且模型对特征参数量的需求较大。深度交叉网络能够有效实现对特征的交叉提取,但其对数据特征的表征能力仍然不足。因此,文中引入多重残差结构与交叉编码思想,提出了一种基于决策树的方法来改进深度交叉网络的推荐模型。首先基于GBDT算法设计构建强化特征的树结构,加强模型对潜在特征的深度挖掘;其次对模型嵌入层的输入参数维度进行扩增优化;最后对改进的深度交叉网络推荐模型进行推荐预测。该设计不仅可以克服现有模型在泛化能力上的局限性,还能在保持特征参数量精简的同时令其表征能力有所加强,进而有效挖掘用户的隐藏关联,提高推荐的准确度。基于公测数据集的实验结果表明,所提出的模型预测效果比现有的特征交互方法更优。

关键词: 特征挖掘, 特征交叉, 强化特征, 决策树, 推荐模型

Abstract: Feature mining is a key step to learn the interaction between users and items in the recommendation algorithm model,which is of great significance to improve the accuracy of the recommendation model.Among the existing feature mining models,although the linear logistic regression model is simple and can achieve good fitting effect,its generalization ability is weak,and the model has a large demand for feature parameters.Deep & Cross network can effectively realize the cross extraction of features,but its representation ability of data features is still insufficient.Therefore,by introducing the idea of multiple residual structure and cross coding,an improved recommendation model of Deep & Cross network based on decision tree is proposed.Firstly,it designs a tree structure based on GBDT algorithm to construct enhanced features,which strengthens the deep mining of the model on potential features.Secondly,the input parameter dimension of the embedded layer of the model is amplified and optimized.Finally,the improved Deep & Cross network recommendation model is used for recommendation prediction.This design can not only break the limitations of existing models in generalization ability,but also keep the feature parameters simple and strengthen their representation ability,so as to effectively mine the hidden associations of users and improve the accuracy of recommendation.Experimental results based on the public test data set show that the prediction effect of the proposed model is better than the exis-ting feature interaction methods.

Key words: Feature mining, Feature crossover, Enhanced feature, Decision tree, Recommendation model

中图分类号: 

  • TP301.6
[1]BARROS M,COUTO F M,PATO M,et al.Creating Recommender Systems Datasets in Scientific Fields[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.2021:4029-4030.
[2]HAO X B,LIU Y D,XIE R B,et al.Adversarial Feature Trans-lation for Multi-domain Recommendation[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.2021:2964-2973.
[3]GUPTA U,WU C J,WANG X D,et al.The architectural implications of facebook’s DNN-based personalized recommendation[C]//Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture.IEEE,2020:488-501.
[4]DONG M Q,YUAN F,YAO L N,et al.MAMO:Memory-Augmented Meta-Optimization for Cold-start Recommendation[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2020:688-697.
[5]HUANG Z,TAO M Y,ZHANG B F.Deep Inclusion Relation-aware Network for User Response Prediction at Fliggy[C]//Proceedings of the 27th ACM SIGKDD Conference on Know-ledge Discovery & Data Mining.2021:3059-3067.
[6]CHENG H,KOC L,HARMSEN J,et al.Wide & Deep learning for recommender systems[C]//Proceedings of the 1st Workshop on Deep Learning for Recommender Systems.2016:7-10.
[7]ZHANG S C.Research on Recommendation Algorithm Based on Collaborative Filtering[C]//Proceedings of the 2nd Interna-tional Conference on Artificial Intelligence and Information Systems.2021:1-4.
[8]GONG L X,WANG J Y.Research on Collaborative FilteringRecommendation Algorithm for Improving User Similarity Calculation[C]//Proceedings of the 2021 International Conference on Control and Intelligent Robotics.2021:331-336.
[9]XU J P,WU L F,PANG X L,et al.2nd International Workshop on Industrial Recommendation Systems[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.2021:4173-4174.
[10]LANG L,ZHU Z L,LIU X Y,et al.Architecture and Operation Adaptive Network for Online Recommendations[C]//Procee-dings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.2021:3139-3149.
[11]GUO H F,CHEN B,TANG R M,et al.An Embedding Learning Framework for Numerical Features in CTR Prediction[C]//Proceedings of the 27th ACM SIGKDD Conference onKnow-ledge Discovery & Data Mining.2021:2910-2918.
[12]LI P,JIANG Z C,QUE M F,et al.Dual Attentive Sequential Learning for Cross-Domain Click-Through Rate Prediction[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.2021:3172-3180.
[13]GUO L Y,JIN J Q,ZHANG H Q,et al.We Know What You Want:An Advertising Strategy Recommender System for Online Advertising[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.2021:2919-2927.
[14]ZHOU G R,ZHU X Q,SONG C R,et al.Deep interest network for click-through rate prediction[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Disco-very & Data Mining.2018:1059-1068.
[15]WANG P,JIANG Y,XU C,et al.Overview of Content-Based Click-Through Rate Prediction Challenge for Video Recommendation[C]//Proceedings of the 27th ACM International Confe-rence on Multimedia.2019:2593-2596.
[16]HE X,PAN J,JIN O,et al.Practical lessons from predicting clicks on ads at facebook[C]//Proceedings of the 8th International Workshop on Data Mining for Online Advertising.2014:1-9.
[17]CHEN C,ZHANG M,MA W Z,et al.Efficient Non-Sampling Factorization Machines for Optimal Context-Aware Recommendation[C]//Proceedings of The Web Conference 2020.2020:2400-2410.
[18]LIU B,ZHU C X,LI G L,et al.AutoFIS:Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.2020:2636-2645.
[19]COVINGTON P,ADAMS J,SARGIN E.Deep neural networks for YouTube recommendations[C]//Proceedings of the 10th ACM Conference on Recommender Systems.2016:191-198.
[20]GUO H,TANG R,YE Y,et al.DeepFM:a factorization-machine based neural network for CTR prediction[C]//Procee-dings of the 26th International Joint Conference on Artificial Intelligence.2017:1725-1731.
[21]WANG R,FU B,FU G,et al.Deep & Cross network for ad click predictions[C]//Proceedings of the ADKDD’17.2017:1-7.
[22]JUAN Y,ZHUANG Y,CHIN W,et al.Field-aware factoriza-tion machines for CTR prediction[C]//Proceedings of the 10th ACM Conference on Recommender Systems.2016:43-50.
[23]CARLOS M P,CAMILO V,JUAN M M,et al.Leveraging User Embeddings and Text to Improve CTR Predictions With Deep Recommender Systems[C]//Proceedings of the Recommender Systems Challenge 2020.2020:11-15.
[24]PRAOWPAN T.Identifying key drivers in airline recommendations using logistic regression from web scraping[C]//Procee-dings of the 2020 the 3rd International Conference on Compu-ters in Management and Business.2020:112-116.
[25]REN K,ZHANG W,RONG Y,et al.User response learning for directly optimizing campaign performance in display advertising[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management.2016:679-688.
[26]JIANG G W,WANG H,CHEN J,et al.XLightFM:Extremely Memory-Efficient Factorization Machine[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.2021:337-346.
[27]SUN Y,PAN J W,ZHANG A,et al.FM2:Field-matrixed Factorization Machines for Recommender Systems[C]//Procee-dings of the Web Conference 2021.2021:2828-2837.
[28]LIAN J X,ZHOU X H,ZHANG F Z,et al.XDeepFM:Combining Explicit and Implicit Feature Interactions for Recommender Systems[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:1754-1763.
[29]MENG Z,ZHANG J,LI Y,et al.A general method for automa-tic discovery of powerful interactions in click-through rate prediction[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.2021:1298-1307.
[30]SAKURAI K H,SHIMIZU T K.Actor-based incremental tree data processing for large-scale machine learning applications[C]//Proceedings of the 9th ACM SIGPLAN International Workshop on Programming Based on Actors,Agents,and Decentralized Control.2019:1-10.
[31]HOSSAIN M,RAFI S,HOSSAIN S.An Optimized DecisionTree based Android Malware Detection Approach using Machine Learning[C]//Proceedings of the 7th International Conference on Networking,Systems and Security.2020:115-125.
[32]WAN X C,ZHANG H,WANG H,et al.RAT-Resilient All reduce Tree for Distributed Machine Learning[C]//Proceedings of the 4th Asia-Pacific Workshop on Networking.2020:52-57.
[33]SONG W P,SHI C,XIAO Z P,et al.Autoint:Automatic feature interaction learning via self-attentive neural networks[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management.2019:1161-1170.
[34]CHEN X,DU Y L,XIA L,et al.Reinforcement Recommendation with User Multi-aspect Preference[C]//Proceedings of the Web Conference 2021.2021:425-435.
[35]GUO H,GUO W,GAO Y,et al.ScaleFreeCTR:MixCache-based Distributed Training System for CTR Models with Huge Embedding Table[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.2021:1269-1278.
[36]ZHAO P,LUO C,ZHOU C,et al.RLNF:Reinforcement Lear-ning based Noise Filtering for Click-Through Rate Prediction[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.2021:2268-2272.
[37]XIAO J,YE H,HE X,et al.Attentional factorization machines:Learning the weight of feature interactions via attention networks[J].arXiv:1708.04617,2017.
[38]HE X,CHUA T S.Neural factorization machines for sparse predictive analytics[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.2017:355-364.
[39]QU Y,CAI H,REN K,et al.Product-based neural networks for user response prediction[C]//Proceedings of the 16th International Conference on Data Mining(ICDM).IEEE,2016:1149-1154.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!