计算机科学 ›› 2022, Vol. 49 ›› Issue (6): 149-157.doi: 10.11896/jsjkx.210600226

• 数据库&大数据&数据科学 • 上一篇    下一篇


洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄   

  1. 陆军工程大学指挥控制工程学院 南京 210007
  • 收稿日期:2021-06-29 修回日期:2021-10-16 出版日期:2022-06-15 发布日期:2022-06-08
  • 通讯作者: 赖俊(2568754202@qq.com)
  • 作者简介:(2206851664@qq.com)

Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration

HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong   

  1. Command & Control Engineering College,Army Engineering University of PLA,Nanjing 210007,China
  • Received:2021-06-29 Revised:2021-10-16 Online:2022-06-15 Published:2022-06-08
  • About author:HONG Zhi-li,born in 1994,postgra-duate.His main research interests include deep reinforcement learning,reco-mmendation system and game confrontation.
    LAI Jun,born in 1979,postgraduate,associate professor,master supervisor.His main research interests include deep reinforcement learning and command information system engineering.

摘要: 近年来,深度强化学习在推荐系统中的应用受到了越来越多的关注。在已有研究的基础上提出了一种新的推荐模型RP-Dueling,该模型在深度强化学习Dueling-DQN的基础上加入了遗憾探索机制,使算法根据训练程度自适应地动态调整“探索-利用”占比。该算法实现了在拥有大规模状态空间的推荐系统中捕捉用户动态兴趣和对动作空间的充分探索。在多个数据集上进行测试,所提算法在MAE和RMSE两个评价指标上的最优平均结果分别达到了0.16和0.43,比目前的最优研究结果分别降低了0.48和0.56,实验结果表明所提模型优于目前已有的传统推荐模型和基于深度强化学习的推荐模型。

关键词: Dueling-DQN, RP-Dueling, 动态兴趣, 深度强化学习, 推荐系统, 遗憾探索

Abstract: In recent years,the application of deep reinforcement learning in recommendation system has attracted much attention.Based on the existing research,this paper proposes a new recommendation model RP-Dueling,which is based on the deep reinforcement learning Dueling-DQN algorithm,and adds the regret exploration mechanism to make the algorithm adaptively and dynamically adjust the proportion of “exploration-utilization” according to the training degree.The algorithm can capture users’ dynamic interest and fully explore the action space in the recommendation system with large-scale state space.By testing the proposed algorithm model on multiple data sets,the optimal average results of MAE and RMSE are 0.16 and 0.43 respectively,which are 0.48 and 0.56 higher than the current optimal research results.Experimental results show that the proposed model is superior to the existing traditional recommendation model and recommendation model based on deep reinforcement learning.

Key words: Deep reinforcement learning, Dueling-DQN, Dynamic interest, Recommendation system, Regret exploration, RP-Dueling


  • TP181
[1] JACOBI J A,BENSON E A,LINDEN G D.Recommendationsystem:U.S.Patent 7,908,183[P].[2011-3-15].https://patents.glgoo.top/patent/US7908183B2/en.
[2] SCHAFER J B,FRANKOWSKI D,HERLOCKER J,et al.Collaborative filtering recommender systems[M]//The Adaptive Web.Berlin:Springer Press,2007:291-324.
[3] DORSCH M,QIU Y,SOLER D,et al.PK1/EG-VEGF induces monocyte differentiation and activation[J].Journal of Leukocyte Biology,2005,78(2):426-434.
[4] QI H M,LIU Q,DAI D X.Personalized Friend Recommendation based on Interest Topics[J].Computer Engineering and Science,2018,40(2):348-353.
[5] SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].USA:MIT Press,2018.
[6] MOHRI M,ROSTAMIZADEH A,TALWALKAR A.Foundations of machine learning[M].USA:MIT Press,2018.
[7] JORDAN M I,MITCHELL T M.Machine learning:Trends,perspectives,and prospects[J].Science,2015,349(6245):255-260.
[8] MESSNER W,HOROWITZ R,KAO W W,et al.A new adaptive learning rule[C]//Proceedings of IEEE International Conference on Robotics and Automation.New York:IEEE Press,1990:1522-1527.
[9] KAELBLING L P,LITTMAN M L,MOORE A W.Reinforcement learning:A survey[J].Journal of Artificial Intelligence Research,1996,4(1):237-285.
[10] ROJANAVASU P,SRINIL P,PINNGERN O.New Recommendation System Using Reinforcement Learning[J].International Journal of the Computer,the Internet and Management,2005,13(3):23.
[11] ZHENG G,ZHANG F,ZHENG Z,et al.DRN:A deep reinforcement learning framework for news recommendation[C]//27th International World Wide Web(WWW 2018).Association for Computing Machinery,2018:167-176.
[12] LEI Y,WANG Z,LI W,et al.Social attentive deep q-network for recommendation[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:1189-1192.
[13] ZHAO Z,CHEN X.Deep Reinforcement Learning based Reco-mmend System using stratified sampling[C]//IOP Conference Series:Materials Science and Engineering.IOP Publishing,2018.
[14] ZINKEVICH M,JOHANSON M,BOWLING M,et al.Regret minimization in games with incomplete information[J].Ad-vances in Neural Information Processing Systems,2007,20(14):1729-1736.
[15] YUAN F,HE X,KARATZOGLOU A,et al.Parameter-efficienttransfer from sequential behaviors for user modeling and recommendation[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:1469-1478.
[16] BAGHER R C,HASSANPOUR H,MASHAYEKHI H.Usertrends modeling for a content-based recommender system[J].Expert Systems with Applications,2017,87:209-219.
[17] HUANG Z,SHAN G,CHENG J,et al.TRec:An efficientrecommendation system for hunting passengers with deep neural networks[J].Neural Computing and Applications,2019,31(1):209-222.
[18] HE X,HE Z,SONG J,et al.Nais:Neural attentive item simila-rity model for recommendation[J].IEEE Transactions on Knowledge and Data Engineering,2018,30(12):2354-2366.
[19] PAZZANI M J,BILLSUS D.Content-based recommendationsystems[M]//The Adaptive Web.Berlin:Springer Press,2007:325-341.
[20] BREESE J S,HECKERMAN D,KADIE C.Empirical Analysis of Predictive Algorithms for Collaborative Filtering[J].Uncertainty in Artificial Intelligence,2013,98(7):43-52.
[21] LIN W,ALVAREZ S A,RUIZ C.Efficient Adaptive-Support Association Rule Mining for Recommender Systems[J].Data Mining & Knowledge Discovery,2002,6(1):83-105.
[22] YIN Y,FENG D,SHI S.A Utility based personalized article recommendation method[J].Journal of Computer Science,2017,40(12):2797-2811.
[23] VARTAK M,MADDEN S.CHIC:a combination-based recommendation system[C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data.2013:981-984.
[24] FU M,QU H,YI Z,et al.A novel deep learning-based collaborative filtering model for recommendation system[J].IEEE transactions on cybernetics,2018,49(3):1084-1096.
[25] LI C,QUAN C,PENG L,et al.A capsule network for recommendation and explaining what you like and dislike[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:275-284.
[26] GABRIEL DE SOUZA P M,JANNACH D,DA CUNHA A M.Contextual hybrid session-based news recommendation with recurrent neural networks[J].IEEE Access,2019,7:169185-169203.
[27] CHEN X,LI S,LI H,et al.Generative adversarial user model for reinforcement learning based recommendation system[C]//International Conference on Machine Learning.PMLR,2019:1052-1061.
[28] XIAO Y,XIAO L,LU X Z,et al.Deep Reinforcement Learning-Based User Profile Perturbation for Privacy Aware Recommendation[J].IEEE Internet of Things Journal,2020,8(6):4560-4568.
[29] ZHANG Y Y,SU X Y,LIU Y.A Novel Movie Recommenda-tion System Based on Deep Reinforcement Learning with Prio-ritized Experience Replay[C]//2019 IEEE 19th International Conference on Communication Technology (ICCT).New York:IEEE,2019:1496-1500.
[30] WATKINS C J C H,DAYAN P.Q-learning[J].Machine lear-ning,1992,8(3/4):279-292.
[31] PETERS J,SCHAAL S.Natural Actor-Critic[J].Neurocompu-ting,2008,71(7/8/9):1180-1190.
[32] WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1995-2003.
[33] FAN J,WANG Z,XIE Y,et al.A theoretical analysis of deep Q-learning[C]//Learning for Dynamics and Control.PMLR,2020:486-489.
[34] XIANG L.Recommended system practice[M].Beijing:Posts & Telecom Press.2012.
[35] HERLOCKER J L,KONSTAN J A,TERVEEN L G,et al.Evaluating collaborative filtering recommender systems[J].ACM Transactions onInformation Systems(TOIS),2004,22(1):5-53.
[36] COLLINS A,TKACZYK D,BEEL J.A Novel Approach toRecommendation Algorithm Selection using Meta-Learning[C]//AICS.2018:210-219.
[37] YANG K X,LI Y W.Development and Design of mobile Intelligent Learning Platform based on Collaborative Filtering Algorithm[J].Software Engineering and Applications,2019,8(3):104-114.
[38] AHARON M,ELAD M,BRUCKSTEIN A.K-SVD:An algo-rithm for designing overcomplete dictionaries for sparse representation[J].IEEE Transactions on Signal Processing,2006,54(11):4311-4322.
[39] KOREN Y.Factorization meets the neighborhood:a multiface-ted collaborative filtering model[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.2008:426-434.
[40] WANG X,YANG H,LIM K.Privacy-preserving POI recommendation using nonnegative matrix factorization[C]//2018 IEEE Symposium on Privacy-aware Computing(PAC).New York:IEEE,2018:117-118.
[41] BARRON E N,ISHII H.The Bellman equation for minimizing the maximum cost[J].Nonlinear Analysis:Theory,Methods & Applications,1989,13(9):1067-1090.
[42] AMIT R,MEIR R,CIOSEK K.Discount factor as a regularizer in reinforcement learning[C]//International Conference on Machine Learning.PMLR,2020:269-278.
[1] 程章桃, 钟婷, 张晟铭, 周帆.
Survey of Recommender Systems Based on Graph Learning
计算机科学, 2022, 49(9): 1-13. https://doi.org/10.11896/jsjkx.210900072
[2] 王冠宇, 钟婷, 冯宇, 周帆.
Collaborative Filtering Recommendation Method Based on Vector Quantization Coding
计算机科学, 2022, 49(9): 48-54. https://doi.org/10.11896/jsjkx.210700109
[3] 熊丽琴, 曹雷, 赖俊, 陈希亮.
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[4] 秦琪琦, 张月琴, 王润泽, 张泽华.
Hierarchical Granulation Recommendation Method Based on Knowledge Graph
计算机科学, 2022, 49(8): 64-69. https://doi.org/10.11896/jsjkx.210600111
[5] 方义秋, 张震坤, 葛君伟.
Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning
计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011
[6] 帅剑波, 王金策, 黄飞虎, 彭舰.
Click-Through Rate Prediction Model Based on Neural Architecture Search
计算机科学, 2022, 49(7): 10-17. https://doi.org/10.11896/jsjkx.210600009
[7] 齐秀秀, 王佳昊, 李文雄, 周帆.
Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning
计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126
[8] 于滨, 李学华, 潘春雨, 李娜.
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[9] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[10] 蔡晓娟, 谭文安.
Improved Collaborative Filtering Algorithm Combining Similarity and Trust
计算机科学, 2022, 49(6A): 238-241. https://doi.org/10.11896/jsjkx.210400088
[11] 何亦琛, 毛宜军, 谢贤芬, 古万荣.
Matrix Transformation and Factorization Based on Graph Partitioning by Vertex Separator for Recommendation
计算机科学, 2022, 49(6A): 272-279. https://doi.org/10.11896/jsjkx.210600159
[12] 谢万城, 李斌, 代玥玥.
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[13] 郭亮, 杨兴耀, 于炯, 韩晨, 黄仲浩.
Hybrid Recommender System Based on Attention Mechanisms and Gating Network
计算机科学, 2022, 49(6): 158-164. https://doi.org/10.11896/jsjkx.210500013
[14] 熊中敏, 舒贵文, 郭怀宇.
Graph Neural Network Recommendation Model Integrating User Preferences
计算机科学, 2022, 49(6): 165-171. https://doi.org/10.11896/jsjkx.210400276
[15] 余皑欣, 冯秀芳, 孙静宇.
Social Trust Recommendation Algorithm Combining Item Similarity
计算机科学, 2022, 49(5): 144-151. https://doi.org/10.11896/jsjkx.210300217
Full text



No Suggested Reading articles found!