计算机科学 ›› 2021, Vol. 48 ›› Issue (10): 1-18.doi: 10.11896/jsjkx.210200085

• 人工智能* 上一篇    下一篇

基于强化学习的推荐研究综述

余力1, 杜启翰1, 岳博妍1, 向君瑶1, 徐冠宇2, 冷友方1   

  1. 1 中国人民大学信息学院 北京100872
    2 北京理工大学徐特立学院 北京100081
  • 收稿日期:2021-02-08 修回日期:2021-05-21 出版日期:2021-10-15 发布日期:2021-10-18
  • 通讯作者: 余力(buaayuli@ruc.edu.cn)
  • 基金资助:
    国家自然科学基金(71271209);中国人民大学研究基金(2020030228)

Survey of Reinforcement Learning Based Recommender Systems

YU Li1, DU Qi-han1, YUE Bo-yan1, XIANG Jun-yao1, XU Guan-yu2, LENG You-fang1   

  1. 1 School of Information,Renmin University of China,Beijing 100872,China
    2 XUTELI School,Beijing Institute of Technology,Beijing 100081,China
  • Received:2021-02-08 Revised:2021-05-21 Online:2021-10-15 Published:2021-10-18
  • About author:YU Li,born in 1973,Ph.D,associate professor.His main research interests include deep learning and recommender systems.
  • Supported by:
    National Natural Science Foundation of China(71271209) and Research Foundation of Renmin University of China(2020030228).

摘要: 推荐系统致力于从海量数据中为用户寻找并自动推荐有价值的信息和服务,可有效解决信息过载问题,成为大数据时代一种重要的信息技术。但推荐系统的数据稀疏性、冷启动和可解释性等问题,仍是制约推荐系统广泛应用的关键技术难点。强化学习是一种交互学习技术,该方法通过与用户交互并获得反馈来实时捕捉其兴趣漂移,从而动态地建模用户偏好,可以较好地解决传统推荐系统面临的经典关键问题。强化学习已成为近年来推荐系统领域的研究热点。文中从综述的角度,首先在简要回顾推荐系统和强化学习的基础上,分析了强化学习对推荐系统的提升思路,对近年来基于强化学习的推荐研究进行了梳理与总结,并分别对传统强化学习推荐和深度强化学习推荐的研究情况进行总结;在此基础上,重点总结了近年来强化学习推荐研究的若干前沿,以及其应用研究情况。最后,对强化学习在推荐系统中应用的未来发展趋势进行分析与展望。

关键词: 多臂老虎机, 马尔可夫决策过程, 强化学习, 深度强化学习, 推荐系统

Abstract: Recommender systems are devoted to find and automatically recommend valuable information and services for users from massive data,which can effectively solve the information overload problem,and become an important information technology in the era of big data.However,the problems of data sparsity,cold start,and interpretability are still the key technical difficulties that limit the wide application of the recommender systems.Reinforcement learning is an interactive learning technique,which can dynamically model user preferences by interacting with users and obtaining feedback to capture their interest drift in real time,and can better solve the classical key issues faced by traditional recommender systems.Nowadays,reinforcement lear-ning has become a hot research topic in the field of recommendation systems.From the perspective of survey,this paper first analyzes the improvement ideas of reinforcement learning for recommender systems based on a brief review of recommender systems and reinforcement learning.Then,the paper makes a general overview and summary of reinforcement learning based recommender systems in recent years,and further summarizes the research situation of traditional reinforcement learning based recommendation and deep reinforcement learning based recommendation respectively.Furthermore,the paper summarizes the frontiers of reinforcement learning based recommendation research topic in recent years and its application.Finally,the future development trend and application of reinforcement learning in recommender systems are analyzed.

Key words: Deep reinforcement learning, Markov decision process, Multiple arm bandits, Recommender systems, Reinforcement learning

中图分类号: 

  • TP183
[1]MARZ N,WARREN J.Big Data:Principles and best practices of scalable realtime data systems [M].USA:Manning,2015:44-49.
[2]KOREN Y,BELL R,VOLINSKY C.Matrix factorization techniques for recommender systems[J].Computer,2009,42(8):30-37.
[3]BOBADILLA J,ORTEGA F,HERNANDO A,et al.Recommender systems survey[J].Knowledge Based Systems,2013,46:109-132.
[4]HUANG L W,JIANG B T,LV S Y,et al.Survey on deep lear-ning based recommender systems[J].Chinese Journal of Compu-ters,2018,41(7):1619-1647.
[5]BATMAZ Z,YUREKLI A,BILGE A,et al.A review on deep learning for recommender systems:challenges and remedies[J].Artificial Intelligence Review,2019,52(1):1-37.
[6]ZHAO X Y,XIA L,TANG J L,et al.Deep ReinforcementLearning for Search,Recommendation and Online Advertising:A Survey[J].ACM SIGWEB Newsletter,2019 (Spring):1-15.
[7]LIU Q,ZHAI J W,ZHANG Z Z,et al.A Survey on Deep Reinforcement Learning[J].Chinese Journal of Computers,2018,41(1):1-27.
[8]ZHAO X X,ZHANG W N,WANG J.Interactive collaborative filtering [C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management.ACM Press,2013:1411-1420.
[9]ZHAO X Y,ZHANG L,DING Z Y,et al.Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:1040-1048.
[10]WAN L P,LAN X G,ZHANG H B.The theory and application of deep reinforcement learning[J].Pattern Recognition and Artificial Intelligence,2019,32(1):67-81.
[11]SARWAR B M,KARYPIS G,KONSTAN J A,et al.Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th International Conference on World Wide Web.2001:285-295.
[12]VAN M R,VAN S M.Using content-based filtering for recommendation[C]//Proceedings of the Workshop on Machine Learning in The New Information Age.2000:47-56.
[13]AN M X,WU F Z,WU C H.Neural News Recommendation with Long and Short-term User Representations [C]//The 57th Annual Meeting of the Association for Computational Linguistics.2019:336-345.
[14]MA J Q,ZHAO Z,YI X Y.Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:1930-1939.
[15]DU W,DING S F.A survey of Multi-Agent ReinforcementLearning[J].Computer Science,2019,46(8):1-8.
[16]LIN X,CHEN H J.A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation [C]//Proceedings of the 13th ACM Conference on Recommender Systems.2019:20-28.
[17]GE T Z,ZHAO L Q.Image Matters:Visually modeling user behaviors using Advanced Model Server [C]//Proceedings of the 27th ACM International Conference on Information and Know-ledge Management.2018:2087-2095.
[18]GUO Q Y,ZHUANG F Z,QIN C,et al.A survey on knowledge graph-based recommender systems[J].IEEE Transactions on Knowledge and Data Engineering,2020,50(7):937-957.
[19]YUE Y S,GUESTRIN G.Linear submodular bandits and their application to diversified retrieval [C]//Neural Information Processing Systems.2011:2483-2491.
[20]SHANI G,HECKERMAN D,BRAFMAN R I.An MDP-based recommender system[J].Journal of Machine Learning Research,2005,6(9):1265-1295.
[21]AUER P.Using confidence bounds for exploitation-exploration trade-offs[J].Journal of Machine Learning Research,2002,3(1):397-422.
[22]AGRAWAL S,GOYAL N.Analysis of thompson sampling for the multi-armed bandit problem [C]//Proceedings of the 25th Annual Conference on Learning Theory.2012:1-26.
[23]BOUNEFFOUF D,BOUZEGHOUB A,GANCARSKI A L.A contextual-bandit algorithm for mobile context-aware recommender system [C]//Neural Information Processing.2012:324-331.
[24]LI L H,CHU W,LANGFORD J,et al.A Contextual-Bandit Approach to Personalized News Article Recommendation [C]//Proceedings of the 19th International Conference on World Wide Web.Raleigh,2010:661-670.
[25]ALLESIARDO R,FERAUD R,BOUNEFFOUF D.A neural networks committee for the contextual bandit problem [C]//International Conference on Neural Information Processing.2014:374-381.
[26]AGRAWAL S,GOYAL N.Thompson sampling for contextual bandits with linear payoffs[C]//International Conference on Machine Learning.2013:127-135.
[27]LIU J W,GAO F,LUO X L.A survey of deep reinforcement learning based on value function and strategy gradient[J].Journal of Computer Science,2019,42(6):1406-1438.
[28]MNIHL V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-542.
[29]VAN H H,GUEZAND A,SILVER D.Deep ReinforcementLearning with Double Q-learning [C]//Proceedings of AAAI Conference on Artificial Intelligence.2016:2094-2110.
[30]WANG Z Y,SCHAUL T,HESSEL M,et al.Dueling Network Architectures for Deep Reinforcement Learning[C]//International Conference on Machine Learning.2016:1995-2003.
[31]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized Experience Replay [C]//Proceedings of International Conference on Learning Representations.2016:1-21.
[32]FORTUNATO M,AZARM G,PIOT B,et al.Noisy networksfor exploration[J].arXiv:1706.10295,2017.
[33]BELLEMARE M G,DABNEY W,MUNOS R.A distributional perspective on reinforcement learning [C]//International Conference on Machine Learning.2017:449-458.
[34]HESSEL M,MODAYIL J,VAN H H,et al.Rainbow:Combining Improvements in Deep Reinforcement Learning [C]//Proceedings of Association for the Advancement of Artificial Intelligence.2018:3215-3222.
[35]SILVER D,LEVER G,HEESS N,et al.Deterministic PolicyGradient Algorithms [C]//International Conference on Machine Learning.2014:387-395.
[36]KULKARNI T D,NARASIMHAN K R,SAEEDI A,et al.Hierarchical Deep Reinforcement Learning:Integrating Temporal Abstraction and Intrinsic Motivation [C]//Proceedings of Thirtieth Conference on Neural Information Processing Systems.2016:1-9.
[37]ZHENG G J,ZHANG F Z,ZHENG Z H,et al.DRN:A Deep Reinforcement Learning Framework for News Recommendation [C]//Proceedings of the 2018 World Wide Web Conference.2018:167-176.
[38]SHANI G,GUNAWARDANA A.Evaluating recommendation systems[M]//Recommender Systems Handbook.Boston:Springer,2011:257-297.
[39]WANG X X,WANG Y,HSU D,et al.Exploration in Interactive Personalized Music Recommendation:A Reinforcement Learning Approach[J].ACM Transactions on Multimedia Computing,Communications,and Applications,2014,11(1):1-22.
[40]WU Q Y,WANG H Z,GU Q Q,et al.Contextual Bandits in a Collaborative Environment [C]//Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval.2016:529-538.
[41]BRODEN B,HAMMAR M,NILSSON B J,et al.Ensemble Re-commendations via Thompson Sampling:an Experimental Study within e-Commerce [C]//Proceedings of the 2018 Conference on Human Information Interaction & Retrieval.2018:19-29.
[42]WANG H Z,WU Q Y,WANG H N.Factorization Bandits forInteractive Recommendation [C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.2017:2695-2702.
[43]INTAYOAD W,KAMYOD C,TEMDEE P.ReinforcementLearning Based on Contextual Bandits for Personalized Online Learning Recommendation Systems[J].Wireless Personal Communications,2020(115):2917-2932.
[44]SHEN Y L,DENG Y,RAY A,et al.Interactive recommendation via deep neural memory augmented contextual bandits [C]//Proceedings of the 12th ACM Conference on Recommender Systems.2018:122-130.
[45]ZHANG Y,ZHANG C W,LIU X Z.Dynamic Scholarly Colla-borator Recommendation via Competitive Multi-Agent Reinforcement Learning [C]//Proceedings of the Eleventh ACM Confe-rence on Recommender Systems.2017:331-335.
[46]LIEBMAN E,SAAR T M,STONE P.DJ-MC:A Reinforce-ment-Learning Agent for Music Playlist Recommendation [C]//Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems.2015:591-599.
[47]CHOI S,HA H,HWANG U,et al.Reinforcement Learningbased Recommender System using Biclustering Technique[J].arXiv:1801.05532,2018.
[48]DE N F,THEOCHAROUS G,VLASSIS N,et al.Capacity-aware Sequential Recommendations [C]//Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems.2018:416-424.
[49]LIU W,LIU F,TANG R,et al.Balancing Between Accuracy and Fairness for Interactive Recommendation with Reinforcement Learning[C]//Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining.2020:155-167.
[50]LIU F,TANG R,GUO H,et al.Top-aware reinforcement lear-ning based recommendation[J].Neurocomputing,2020,417:255-269.
[51]CHEN S Y,YU Y,DA Q,et al.Stabilizing ReinforcementLearning in Dynamic Environment with Application to Online Recommendation [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.2018:1187-1196.
[52]ZOU L X,XIA L,DING Z Y,et al.Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2019:2810-2818.
[53]CHANG H P,YANG X R,CUI Q,et al.Value-aware Recommendation based on Reinforcement Profit Maximization [C]//Proceedings of the 2019 World Wide Web Conference.2019:3123-3129.
[54]EUGENE I,JAIN V,WANG J,et al.Slate Q:a tractable decomposition for reinforcement learning with recommendation sets [C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.2019:2592-2599.
[55]ZOU L,XIA L,DU P,et al.Pseudo dyna-q:a reinforcement learning framework for interactive recommendation [C]//Proceedings of the 13th International Conference on Web Search and Data Mining.2020:816-824.
[56]LEI Y,LI W J.Interactive Recommendation with User-Specific Deep Reinforcement Learning[J].ACM Transactions on Know-ledge Discovery from Data,2019,13(6):1-15.
[57]LEI Y,PEI H,YAN H,et al.Reinforcement learning based re-commendation with graph convolutional q-network [C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:1757-1760.
[58]ZHANG Y T,CHEN R,TANG J,et al.LEAP:Learning to Prescribe Effective and Safe Treatment Combinations for Multimorbidity [C]//Proceedings of the 23rd ACM SIGKDD Internatio-nal Conferenceon Knowledge Discovery and Data Mining.2017:1315-1324.
[59]ZHAO W,WANG W Y,YE J B,et al.Leveraging Long and Short-Term Information in Content-Aware Movie Recommendation via Adversarial Training[J].IEEE Transactions on Cybernetics,2019,50(11):4680-4693.
[60]SUN Y M,ZHANG Y.Conversational Recommender System [C]//Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval.2018:235-244.
[61]CHEN M M,BEUTEL A,COVINGTON P,et al.Top-K Off-Policy Correction for a REINFORCE Recommender System [C]//Proceedings of the Twelfth ACM International Confe-rence on Web Search and Data Mining.2019:456-464.
[62]CHEN H K,DAI X Y,CAI H,et al.Large-scale interactive re-commendation with tree-structured policy gradient[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2019:3312-3320.
[63]PAN F Y,CAI Q P,TANG P Z,et al.Policy gradients for contextual recommendations [C]//Proceedings of The World Wide Web Conference.2019:1421-1431.
[64]BAI X Y,GUAN J,WANG H N.A model-based reinforcement learning with adversarial training for online recommendation [C]//Proceedings of the 33rd Conference on Neural Information Processing Systems.2019:1-12.
[65]WANG L,ZHANG W,HE X F.Supervised ReinforcementLearning with Recurrent Neural Network for Dynamic Treatment Recommendation [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:2447-2456.
[66]ZHAO X Y,XIA L,ZHANG L,et al.Deep ReinforcementLearning for List-wise Recommendations[J].arXiv:1801.00209,2017.
[67]ZHAO X Y,XIA L,ZHANG L,et al.Deep ReinforcementLearning for Page-wise Recommendations [C]//Proceedings of the 12th ACM Conference on Recommender Systems.2018:95-103.
[68]ZHANG R Y,YU T,SHEN Y L,et al.Text-based interactive recommendation via constraint-augmented reinforcement lear-ning[C]//Proceedings of the 33rd Conference on Neural Information Processing Systems.2019:13-24.
[69]YU T,SHEN Y L,ZHANG R Y,et al.Vision-Language Re-commendation via Attribute Augmented Multimodal Reinforcement Learning [C]//Proceedings of the 27th ACM Internatio-nal Conference on Multimedia.2019:39-47.
[70]WANG P,FAN Y,XIA L,et al.KERL:A knowledge-guidedreinforcement learning model for sequential recommendation [C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:209-218.
[71]CHEN X,HUANG C,YAO L,et al.Knowledge-guided deepreinforcement learning for interactive recommendation [C]//Proceedings of the 2020 International Joint Conference on Neural Networks.2020:1-8.
[72]ZHAO X Y,XIA L,ZHANG L,et al.Model-Based Reinforcement Learning for Whole-Chain Recommendations[J].arXiv:1902.03987,2019.
[73]CHEN X S,LI S,LI H,et al.Generative Adversarial UserModel for Reinforcement Learning Based Recommendation System [C]//Proceedings of the 34th International Conference on Machine Learning.2019:1052-1061.
[74]GAO R,XIA H F,LI J,et al.DRCGR:Deep ReinforcementLearning Framework Incorporating CNN and GAN-Based for Interactive Recommendation [C]//Proceedings of the 2019 IEEE International Conference on Data Mining.2019:1048-1053.
[75]WU H J,DAI D D,FU Q M.Research progress on the combination of reinforcement learning and generative adversary network[J].Journal of Computer Engineering and Application,2019,55(10):41-49.
[76]LIN J H,ZHANG Z C,JIANG C.A survey of imitation learning based on generative adversary network[J].Journal of Computer Science,2020,43(2):326-351.
[77]ZHAO D Y,ZHANG L,ZHANG B,et al.MaHRL:Multi-goals Abstraction Based Deep Hierarchical Reinforcement Learning for Recommendations [C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:871-880.
[78]XIE R B,ZHANG S L,WANG R,et al.Hierarchical Reinforcement Learning for Integrated Recommendation [C]//Procee-dings of the 35th AAAI Conference on Artificial Intelligence.2021:1-8.
[79]ZHANG J,HAO B W,CHEN B,et al.Hierarchical reinforcement learning for course recommendation in MOOCs [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:435-442.
[80]FENG J,LI H,HUANG M,et al.Learning to collaborate:Multi-scenario ranking via multi-agent reinforcement learning [C]//Proceedings of the World Wide Web Conference.2018:1939-1948.
[81]HE X,AN B,LI Y,et al.Learning to Collaborate in Multi-Module Recommendation via Multi-Agent Reinforcement Lear-ning without Communication[C]//Proceedings of the Fourteenth ACM Conference on Recommender Systems.2020:210-219.
[82]GUI T,LIU P,ZHANG Q,et al.Mention Recommendation in Twitter with Cooperative Multi-Agent Reinforcement Learning [C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:535-544.
[83]SHI J C,YU Y,DA Q,et al.Virtual-Taobao:virtualizing real-world online retail environment for reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019,33(1):4902-4909.
[84]SHANG W J,YU Y,LI Q Y,et al.Environment reconstruction with hidden confounders for reinforcement learning based re-commendation [C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.2019:566-576.
[85]ZHAO X Y,XIA L,ZOU L X,et al.Toward simulating environments in reinforcement learning based recommendations[J].arXiv:1906.11462,2019.
[86]ROHDE D,BONNER S,DUNLOP T,et al.RecoGym:a reinforcement learning environment for the problem of product re-commendation in online advertising[J].arXiv:1808.00720,2018.
[87]SHI B,OZSOY M G,HURLEY N,et al.PyRecGym:a rein-forcement learning gym for recommender systems [C]//Proceedings of the 13th ACM Conference on Recommender Systems.2019:491-495.
[88]EUGENE I,HSU C,MLADENOV M,et al.RecSim:a configurablesimulation platform for recommender systems[J].arXiv:1909.04847,2019.
[89]WANG X T,CHEN Y R,JIE Y,et al.A reinforcement learning framework for explainable recommendation [C]//Proceedings of the 2018 IEEE International Conference on Data Mining.2018:587-596.
[90]XIAN Y K,FU Z H,MUTHUKRISHNAN S.Reinforcement knowledge graph reasoning for explainable recommendation [C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:285-294.
[91]MCINERNEY J,LACKER B,HANSEN S,et al.Explore,ex-ploit,and explain:personalizing explainable recommendations with bandits [C]//Proceedings of the 12th ACM Conference on Recommender Systems.2018:31-39.
[92]LEI Y,WANG Z T,LI W J.Social attentive deep q-network for recommendation [C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:1189-1192.
[93]LIU F,GUO H F,LI X T,et al.End-to-end deep reinforcement learning based recommendation with supervised embedding [C]//Proceedings of the 13th International Conference on Web Search and Data Mining.2020:384-392.
[94]WANG Y.A Hybrid Recommendation for Music Based on Reinforcement Learning [C]//Pacific-Asia Conference on Know-ledge Discovery and Data.2020:91-103.
[95]HONG D,LI Y,DONG Q.Nonintrusive-Sensing and Reinforce-ment-Learning Based Adaptive Personalized Music Recommendation [C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:1721-1724.
[96]MASSIMO D,ELAHI M,RICCI F.Learning User Preferences by Observing User-Items Interactions in an IoT Augmented Space [C]//Adjunct Publication of the 25th Conference on User Modeling,Adaptation and Personalization.2017:35-40.
[97]ZHAO Y,ZENG D,SOCINSKI M A,et al.ReinforcementLearning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer[J].Journal of the International Biometric Society,2011,67(4):1422-1433.
[98]LU Z Q,YANG Q.Partially Observable Markov DecisionProcess for Recommender Systems[J].arXiv:1608.07793,2016.
[99]HU Y J,DA Q,ZENG A X,et al.Reinforcement Learning to Rank in E-Commerce Search Engine:Formalization,Analysis,and Application [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.2018:368-377.
[100]CHI C Y,TSAI R T H,LAI J Y,et al.A ReinforcementLearning Approach to Emotion-based Automatic Playlist Gene-ration[C]//Proceedings of International Conference on Technologies and Applications of Artificial Intelligence.2010:60-65.
[101]ZENG C Q,WANG Q,MOKHTARI S.Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2016:2025-2034.
[102]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning [C]//Proceedings of the Workshops at the 26th Neural Information Processing Systems.2013:201-220.
[103]ZHU Y X,LV L Y.Evaluation Metrics for Recommender Systems[J].Journal of University of Electronic Science and Technology of China,2012,41(2):163-176.
[104]ZHANG S,YAO L N,SUN A X,et al.Deep learning basedrecommender system:a survey and new perspectives[J].ACM Computing Surveys,2019,52(1):1-38.
[1] 程章桃, 钟婷, 张晟铭, 周帆.
基于图学习的推荐系统研究综述
Survey of Recommender Systems Based on Graph Learning
计算机科学, 2022, 49(9): 1-13. https://doi.org/10.11896/jsjkx.210900072
[2] 王冠宇, 钟婷, 冯宇, 周帆.
基于矢量量化编码的协同过滤推荐方法
Collaborative Filtering Recommendation Method Based on Vector Quantization Coding
计算机科学, 2022, 49(9): 48-54. https://doi.org/10.11896/jsjkx.210700109
[3] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[4] 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波.
基于边缘智能的频谱地图构建与分发方法
Construction and Distribution Method of REM Based on Edge Intelligence
计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[5] 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军.
基于多智能体强化学习的端到端合作的自适应奖励方法
Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning
计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[6] 秦琪琦, 张月琴, 王润泽, 张泽华.
基于知识图谱的层次粒化推荐方法
Hierarchical Granulation Recommendation Method Based on Knowledge Graph
计算机科学, 2022, 49(8): 64-69. https://doi.org/10.11896/jsjkx.210600111
[7] 方义秋, 张震坤, 葛君伟.
基于自注意力机制和迁移学习的跨领域推荐算法
Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning
计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011
[8] 袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟.
智能博弈对抗方法:博弈论与强化学习综合视角对比分析
Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning
计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[9] 帅剑波, 王金策, 黄飞虎, 彭舰.
基于神经架构搜索的点击率预测模型
Click-Through Rate Prediction Model Based on Neural Architecture Search
计算机科学, 2022, 49(7): 10-17. https://doi.org/10.11896/jsjkx.210600009
[10] 齐秀秀, 王佳昊, 李文雄, 周帆.
基于概率元学习的矩阵补全预测融合算法
Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning
计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126
[11] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[12] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[13] 蔡晓娟, 谭文安.
一种改进的融合相似度和信任度的协同过滤算法
Improved Collaborative Filtering Algorithm Combining Similarity and Trust
计算机科学, 2022, 49(6A): 238-241. https://doi.org/10.11896/jsjkx.210400088
[14] 何亦琛, 毛宜军, 谢贤芬, 古万荣.
基于点割集图分割的矩阵变换与分解的推荐算法
Matrix Transformation and Factorization Based on Graph Partitioning by Vertex Separator for Recommendation
计算机科学, 2022, 49(6A): 272-279. https://doi.org/10.11896/jsjkx.210600159
[15] 郭亮, 杨兴耀, 于炯, 韩晨, 黄仲浩.
基于注意力机制和门控网络相结合的混合推荐系统
Hybrid Recommender System Based on Attention Mechanisms and Gating Network
计算机科学, 2022, 49(6): 158-164. https://doi.org/10.11896/jsjkx.210500013
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!