计算机科学 ›› 2021, Vol. 48 ›› Issue (10): 1-18.doi: 10.11896/jsjkx.210200085

• 人工智能* 上一篇    下一篇

基于强化学习的推荐研究综述

余力1, 杜启翰1, 岳博妍1, 向君瑶1, 徐冠宇2, 冷友方1   

  1. 1 中国人民大学信息学院 北京100872
    2 北京理工大学徐特立学院 北京100081
  • 收稿日期:2021-02-08 修回日期:2021-05-21 出版日期:2021-10-15 发布日期:2021-10-18
  • 通讯作者: 余力(buaayuli@ruc.edu.cn)
  • 基金资助:
    国家自然科学基金(71271209);中国人民大学研究基金(2020030228)

Survey of Reinforcement Learning Based Recommender Systems

YU Li1, DU Qi-han1, YUE Bo-yan1, XIANG Jun-yao1, XU Guan-yu2, LENG You-fang1   

  1. 1 School of Information,Renmin University of China,Beijing 100872,China
    2 XUTELI School,Beijing Institute of Technology,Beijing 100081,China
  • Received:2021-02-08 Revised:2021-05-21 Online:2021-10-15 Published:2021-10-18
  • About author:YU Li,born in 1973,Ph.D,associate professor.His main research interests include deep learning and recommender systems.
  • Supported by:
    National Natural Science Foundation of China(71271209) and Research Foundation of Renmin University of China(2020030228).

摘要: 推荐系统致力于从海量数据中为用户寻找并自动推荐有价值的信息和服务,可有效解决信息过载问题,成为大数据时代一种重要的信息技术。但推荐系统的数据稀疏性、冷启动和可解释性等问题,仍是制约推荐系统广泛应用的关键技术难点。强化学习是一种交互学习技术,该方法通过与用户交互并获得反馈来实时捕捉其兴趣漂移,从而动态地建模用户偏好,可以较好地解决传统推荐系统面临的经典关键问题。强化学习已成为近年来推荐系统领域的研究热点。文中从综述的角度,首先在简要回顾推荐系统和强化学习的基础上,分析了强化学习对推荐系统的提升思路,对近年来基于强化学习的推荐研究进行了梳理与总结,并分别对传统强化学习推荐和深度强化学习推荐的研究情况进行总结;在此基础上,重点总结了近年来强化学习推荐研究的若干前沿,以及其应用研究情况。最后,对强化学习在推荐系统中应用的未来发展趋势进行分析与展望。

关键词: 推荐系统, 强化学习, 深度强化学习, 马尔可夫决策过程, 多臂老虎机

Abstract: Recommender systems are devoted to find and automatically recommend valuable information and services for users from massive data,which can effectively solve the information overload problem,and become an important information technology in the era of big data.However,the problems of data sparsity,cold start,and interpretability are still the key technical difficulties that limit the wide application of the recommender systems.Reinforcement learning is an interactive learning technique,which can dynamically model user preferences by interacting with users and obtaining feedback to capture their interest drift in real time,and can better solve the classical key issues faced by traditional recommender systems.Nowadays,reinforcement lear-ning has become a hot research topic in the field of recommendation systems.From the perspective of survey,this paper first analyzes the improvement ideas of reinforcement learning for recommender systems based on a brief review of recommender systems and reinforcement learning.Then,the paper makes a general overview and summary of reinforcement learning based recommender systems in recent years,and further summarizes the research situation of traditional reinforcement learning based recommendation and deep reinforcement learning based recommendation respectively.Furthermore,the paper summarizes the frontiers of reinforcement learning based recommendation research topic in recent years and its application.Finally,the future development trend and application of reinforcement learning in recommender systems are analyzed.

Key words: Recommender systems, Reinforcement learning, Deep reinforcement learning, Markov decision process, Multiple arm bandits

中图分类号: 

  • TP183
[1]MARZ N,WARREN J.Big Data:Principles and best practices of scalable realtime data systems [M].USA:Manning,2015:44-49.
[2]KOREN Y,BELL R,VOLINSKY C.Matrix factorization techniques for recommender systems[J].Computer,2009,42(8):30-37.
[3]BOBADILLA J,ORTEGA F,HERNANDO A,et al.Recommender systems survey[J].Knowledge Based Systems,2013,46:109-132.
[4]HUANG L W,JIANG B T,LV S Y,et al.Survey on deep lear-ning based recommender systems[J].Chinese Journal of Compu-ters,2018,41(7):1619-1647.
[5]BATMAZ Z,YUREKLI A,BILGE A,et al.A review on deep learning for recommender systems:challenges and remedies[J].Artificial Intelligence Review,2019,52(1):1-37.
[6]ZHAO X Y,XIA L,TANG J L,et al.Deep ReinforcementLearning for Search,Recommendation and Online Advertising:A Survey[J].ACM SIGWEB Newsletter,2019 (Spring):1-15.
[7]LIU Q,ZHAI J W,ZHANG Z Z,et al.A Survey on Deep Reinforcement Learning[J].Chinese Journal of Computers,2018,41(1):1-27.
[8]ZHAO X X,ZHANG W N,WANG J.Interactive collaborative filtering [C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management.ACM Press,2013:1411-1420.
[9]ZHAO X Y,ZHANG L,DING Z Y,et al.Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:1040-1048.
[10]WAN L P,LAN X G,ZHANG H B.The theory and application of deep reinforcement learning[J].Pattern Recognition and Artificial Intelligence,2019,32(1):67-81.
[11]SARWAR B M,KARYPIS G,KONSTAN J A,et al.Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th International Conference on World Wide Web.2001:285-295.
[12]VAN M R,VAN S M.Using content-based filtering for recommendation[C]//Proceedings of the Workshop on Machine Learning in The New Information Age.2000:47-56.
[13]AN M X,WU F Z,WU C H.Neural News Recommendation with Long and Short-term User Representations [C]//The 57th Annual Meeting of the Association for Computational Linguistics.2019:336-345.
[14]MA J Q,ZHAO Z,YI X Y.Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:1930-1939.
[15]DU W,DING S F.A survey of Multi-Agent ReinforcementLearning[J].Computer Science,2019,46(8):1-8.
[16]LIN X,CHEN H J.A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation [C]//Proceedings of the 13th ACM Conference on Recommender Systems.2019:20-28.
[17]GE T Z,ZHAO L Q.Image Matters:Visually modeling user behaviors using Advanced Model Server [C]//Proceedings of the 27th ACM International Conference on Information and Know-ledge Management.2018:2087-2095.
[18]GUO Q Y,ZHUANG F Z,QIN C,et al.A survey on knowledge graph-based recommender systems[J].IEEE Transactions on Knowledge and Data Engineering,2020,50(7):937-957.
[19]YUE Y S,GUESTRIN G.Linear submodular bandits and their application to diversified retrieval [C]//Neural Information Processing Systems.2011:2483-2491.
[20]SHANI G,HECKERMAN D,BRAFMAN R I.An MDP-based recommender system[J].Journal of Machine Learning Research,2005,6(9):1265-1295.
[21]AUER P.Using confidence bounds for exploitation-exploration trade-offs[J].Journal of Machine Learning Research,2002,3(1):397-422.
[22]AGRAWAL S,GOYAL N.Analysis of thompson sampling for the multi-armed bandit problem [C]//Proceedings of the 25th Annual Conference on Learning Theory.2012:1-26.
[23]BOUNEFFOUF D,BOUZEGHOUB A,GANCARSKI A L.A contextual-bandit algorithm for mobile context-aware recommender system [C]//Neural Information Processing.2012:324-331.
[24]LI L H,CHU W,LANGFORD J,et al.A Contextual-Bandit Approach to Personalized News Article Recommendation [C]//Proceedings of the 19th International Conference on World Wide Web.Raleigh,2010:661-670.
[25]ALLESIARDO R,FERAUD R,BOUNEFFOUF D.A neural networks committee for the contextual bandit problem [C]//International Conference on Neural Information Processing.2014:374-381.
[26]AGRAWAL S,GOYAL N.Thompson sampling for contextual bandits with linear payoffs[C]//International Conference on Machine Learning.2013:127-135.
[27]LIU J W,GAO F,LUO X L.A survey of deep reinforcement learning based on value function and strategy gradient[J].Journal of Computer Science,2019,42(6):1406-1438.
[28]MNIHL V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-542.
[29]VAN H H,GUEZAND A,SILVER D.Deep ReinforcementLearning with Double Q-learning [C]//Proceedings of AAAI Conference on Artificial Intelligence.2016:2094-2110.
[30]WANG Z Y,SCHAUL T,HESSEL M,et al.Dueling Network Architectures for Deep Reinforcement Learning[C]//International Conference on Machine Learning.2016:1995-2003.
[31]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized Experience Replay [C]//Proceedings of International Conference on Learning Representations.2016:1-21.
[32]FORTUNATO M,AZARM G,PIOT B,et al.Noisy networksfor exploration[J].arXiv:1706.10295,2017.
[33]BELLEMARE M G,DABNEY W,MUNOS R.A distributional perspective on reinforcement learning [C]//International Conference on Machine Learning.2017:449-458.
[34]HESSEL M,MODAYIL J,VAN H H,et al.Rainbow:Combining Improvements in Deep Reinforcement Learning [C]//Proceedings of Association for the Advancement of Artificial Intelligence.2018:3215-3222.
[35]SILVER D,LEVER G,HEESS N,et al.Deterministic PolicyGradient Algorithms [C]//International Conference on Machine Learning.2014:387-395.
[36]KULKARNI T D,NARASIMHAN K R,SAEEDI A,et al.Hierarchical Deep Reinforcement Learning:Integrating Temporal Abstraction and Intrinsic Motivation [C]//Proceedings of Thirtieth Conference on Neural Information Processing Systems.2016:1-9.
[37]ZHENG G J,ZHANG F Z,ZHENG Z H,et al.DRN:A Deep Reinforcement Learning Framework for News Recommendation [C]//Proceedings of the 2018 World Wide Web Conference.2018:167-176.
[38]SHANI G,GUNAWARDANA A.Evaluating recommendation systems[M]//Recommender Systems Handbook.Boston:Springer,2011:257-297.
[39]WANG X X,WANG Y,HSU D,et al.Exploration in Interactive Personalized Music Recommendation:A Reinforcement Learning Approach[J].ACM Transactions on Multimedia Computing,Communications,and Applications,2014,11(1):1-22.
[40]WU Q Y,WANG H Z,GU Q Q,et al.Contextual Bandits in a Collaborative Environment [C]//Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval.2016:529-538.
[41]BRODEN B,HAMMAR M,NILSSON B J,et al.Ensemble Re-commendations via Thompson Sampling:an Experimental Study within e-Commerce [C]//Proceedings of the 2018 Conference on Human Information Interaction & Retrieval.2018:19-29.
[42]WANG H Z,WU Q Y,WANG H N.Factorization Bandits forInteractive Recommendation [C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.2017:2695-2702.
[43]INTAYOAD W,KAMYOD C,TEMDEE P.ReinforcementLearning Based on Contextual Bandits for Personalized Online Learning Recommendation Systems[J].Wireless Personal Communications,2020(115):2917-2932.
[44]SHEN Y L,DENG Y,RAY A,et al.Interactive recommendation via deep neural memory augmented contextual bandits [C]//Proceedings of the 12th ACM Conference on Recommender Systems.2018:122-130.
[45]ZHANG Y,ZHANG C W,LIU X Z.Dynamic Scholarly Colla-borator Recommendation via Competitive Multi-Agent Reinforcement Learning [C]//Proceedings of the Eleventh ACM Confe-rence on Recommender Systems.2017:331-335.
[46]LIEBMAN E,SAAR T M,STONE P.DJ-MC:A Reinforce-ment-Learning Agent for Music Playlist Recommendation [C]//Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems.2015:591-599.
[47]CHOI S,HA H,HWANG U,et al.Reinforcement Learningbased Recommender System using Biclustering Technique[J].arXiv:1801.05532,2018.
[48]DE N F,THEOCHAROUS G,VLASSIS N,et al.Capacity-aware Sequential Recommendations [C]//Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems.2018:416-424.
[49]LIU W,LIU F,TANG R,et al.Balancing Between Accuracy and Fairness for Interactive Recommendation with Reinforcement Learning[C]//Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining.2020:155-167.
[50]LIU F,TANG R,GUO H,et al.Top-aware reinforcement lear-ning based recommendation[J].Neurocomputing,2020,417:255-269.
[51]CHEN S Y,YU Y,DA Q,et al.Stabilizing ReinforcementLearning in Dynamic Environment with Application to Online Recommendation [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.2018:1187-1196.
[52]ZOU L X,XIA L,DING Z Y,et al.Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2019:2810-2818.
[53]CHANG H P,YANG X R,CUI Q,et al.Value-aware Recommendation based on Reinforcement Profit Maximization [C]//Proceedings of the 2019 World Wide Web Conference.2019:3123-3129.
[54]EUGENE I,JAIN V,WANG J,et al.Slate Q:a tractable decomposition for reinforcement learning with recommendation sets [C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.2019:2592-2599.
[55]ZOU L,XIA L,DU P,et al.Pseudo dyna-q:a reinforcement learning framework for interactive recommendation [C]//Proceedings of the 13th International Conference on Web Search and Data Mining.2020:816-824.
[56]LEI Y,LI W J.Interactive Recommendation with User-Specific Deep Reinforcement Learning[J].ACM Transactions on Know-ledge Discovery from Data,2019,13(6):1-15.
[57]LEI Y,PEI H,YAN H,et al.Reinforcement learning based re-commendation with graph convolutional q-network [C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:1757-1760.
[58]ZHANG Y T,CHEN R,TANG J,et al.LEAP:Learning to Prescribe Effective and Safe Treatment Combinations for Multimorbidity [C]//Proceedings of the 23rd ACM SIGKDD Internatio-nal Conferenceon Knowledge Discovery and Data Mining.2017:1315-1324.
[59]ZHAO W,WANG W Y,YE J B,et al.Leveraging Long and Short-Term Information in Content-Aware Movie Recommendation via Adversarial Training[J].IEEE Transactions on Cybernetics,2019,50(11):4680-4693.
[60]SUN Y M,ZHANG Y.Conversational Recommender System [C]//Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval.2018:235-244.
[61]CHEN M M,BEUTEL A,COVINGTON P,et al.Top-K Off-Policy Correction for a REINFORCE Recommender System [C]//Proceedings of the Twelfth ACM International Confe-rence on Web Search and Data Mining.2019:456-464.
[62]CHEN H K,DAI X Y,CAI H,et al.Large-scale interactive re-commendation with tree-structured policy gradient[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2019:3312-3320.
[63]PAN F Y,CAI Q P,TANG P Z,et al.Policy gradients for contextual recommendations [C]//Proceedings of The World Wide Web Conference.2019:1421-1431.
[64]BAI X Y,GUAN J,WANG H N.A model-based reinforcement learning with adversarial training for online recommendation [C]//Proceedings of the 33rd Conference on Neural Information Processing Systems.2019:1-12.
[65]WANG L,ZHANG W,HE X F.Supervised ReinforcementLearning with Recurrent Neural Network for Dynamic Treatment Recommendation [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:2447-2456.
[66]ZHAO X Y,XIA L,ZHANG L,et al.Deep ReinforcementLearning for List-wise Recommendations[J].arXiv:1801.00209,2017.
[67]ZHAO X Y,XIA L,ZHANG L,et al.Deep ReinforcementLearning for Page-wise Recommendations [C]//Proceedings of the 12th ACM Conference on Recommender Systems.2018:95-103.
[68]ZHANG R Y,YU T,SHEN Y L,et al.Text-based interactive recommendation via constraint-augmented reinforcement lear-ning[C]//Proceedings of the 33rd Conference on Neural Information Processing Systems.2019:13-24.
[69]YU T,SHEN Y L,ZHANG R Y,et al.Vision-Language Re-commendation via Attribute Augmented Multimodal Reinforcement Learning [C]//Proceedings of the 27th ACM Internatio-nal Conference on Multimedia.2019:39-47.
[70]WANG P,FAN Y,XIA L,et al.KERL:A knowledge-guidedreinforcement learning model for sequential recommendation [C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:209-218.
[71]CHEN X,HUANG C,YAO L,et al.Knowledge-guided deepreinforcement learning for interactive recommendation [C]//Proceedings of the 2020 International Joint Conference on Neural Networks.2020:1-8.
[72]ZHAO X Y,XIA L,ZHANG L,et al.Model-Based Reinforcement Learning for Whole-Chain Recommendations[J].arXiv:1902.03987,2019.
[73]CHEN X S,LI S,LI H,et al.Generative Adversarial UserModel for Reinforcement Learning Based Recommendation System [C]//Proceedings of the 34th International Conference on Machine Learning.2019:1052-1061.
[74]GAO R,XIA H F,LI J,et al.DRCGR:Deep ReinforcementLearning Framework Incorporating CNN and GAN-Based for Interactive Recommendation [C]//Proceedings of the 2019 IEEE International Conference on Data Mining.2019:1048-1053.
[75]WU H J,DAI D D,FU Q M.Research progress on the combination of reinforcement learning and generative adversary network[J].Journal of Computer Engineering and Application,2019,55(10):41-49.
[76]LIN J H,ZHANG Z C,JIANG C.A survey of imitation learning based on generative adversary network[J].Journal of Computer Science,2020,43(2):326-351.
[77]ZHAO D Y,ZHANG L,ZHANG B,et al.MaHRL:Multi-goals Abstraction Based Deep Hierarchical Reinforcement Learning for Recommendations [C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:871-880.
[78]XIE R B,ZHANG S L,WANG R,et al.Hierarchical Reinforcement Learning for Integrated Recommendation [C]//Procee-dings of the 35th AAAI Conference on Artificial Intelligence.2021:1-8.
[79]ZHANG J,HAO B W,CHEN B,et al.Hierarchical reinforcement learning for course recommendation in MOOCs [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:435-442.
[80]FENG J,LI H,HUANG M,et al.Learning to collaborate:Multi-scenario ranking via multi-agent reinforcement learning [C]//Proceedings of the World Wide Web Conference.2018:1939-1948.
[81]HE X,AN B,LI Y,et al.Learning to Collaborate in Multi-Module Recommendation via Multi-Agent Reinforcement Lear-ning without Communication[C]//Proceedings of the Fourteenth ACM Conference on Recommender Systems.2020:210-219.
[82]GUI T,LIU P,ZHANG Q,et al.Mention Recommendation in Twitter with Cooperative Multi-Agent Reinforcement Learning [C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:535-544.
[83]SHI J C,YU Y,DA Q,et al.Virtual-Taobao:virtualizing real-world online retail environment for reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019,33(1):4902-4909.
[84]SHANG W J,YU Y,LI Q Y,et al.Environment reconstruction with hidden confounders for reinforcement learning based re-commendation [C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.2019:566-576.
[85]ZHAO X Y,XIA L,ZOU L X,et al.Toward simulating environments in reinforcement learning based recommendations[J].arXiv:1906.11462,2019.
[86]ROHDE D,BONNER S,DUNLOP T,et al.RecoGym:a reinforcement learning environment for the problem of product re-commendation in online advertising[J].arXiv:1808.00720,2018.
[87]SHI B,OZSOY M G,HURLEY N,et al.PyRecGym:a rein-forcement learning gym for recommender systems [C]//Proceedings of the 13th ACM Conference on Recommender Systems.2019:491-495.
[88]EUGENE I,HSU C,MLADENOV M,et al.RecSim:a configurablesimulation platform for recommender systems[J].arXiv:1909.04847,2019.
[89]WANG X T,CHEN Y R,JIE Y,et al.A reinforcement learning framework for explainable recommendation [C]//Proceedings of the 2018 IEEE International Conference on Data Mining.2018:587-596.
[90]XIAN Y K,FU Z H,MUTHUKRISHNAN S.Reinforcement knowledge graph reasoning for explainable recommendation [C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:285-294.
[91]MCINERNEY J,LACKER B,HANSEN S,et al.Explore,ex-ploit,and explain:personalizing explainable recommendations with bandits [C]//Proceedings of the 12th ACM Conference on Recommender Systems.2018:31-39.
[92]LEI Y,WANG Z T,LI W J.Social attentive deep q-network for recommendation [C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:1189-1192.
[93]LIU F,GUO H F,LI X T,et al.End-to-end deep reinforcement learning based recommendation with supervised embedding [C]//Proceedings of the 13th International Conference on Web Search and Data Mining.2020:384-392.
[94]WANG Y.A Hybrid Recommendation for Music Based on Reinforcement Learning [C]//Pacific-Asia Conference on Know-ledge Discovery and Data.2020:91-103.
[95]HONG D,LI Y,DONG Q.Nonintrusive-Sensing and Reinforce-ment-Learning Based Adaptive Personalized Music Recommendation [C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:1721-1724.
[96]MASSIMO D,ELAHI M,RICCI F.Learning User Preferences by Observing User-Items Interactions in an IoT Augmented Space [C]//Adjunct Publication of the 25th Conference on User Modeling,Adaptation and Personalization.2017:35-40.
[97]ZHAO Y,ZENG D,SOCINSKI M A,et al.ReinforcementLearning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer[J].Journal of the International Biometric Society,2011,67(4):1422-1433.
[98]LU Z Q,YANG Q.Partially Observable Markov DecisionProcess for Recommender Systems[J].arXiv:1608.07793,2016.
[99]HU Y J,DA Q,ZENG A X,et al.Reinforcement Learning to Rank in E-Commerce Search Engine:Formalization,Analysis,and Application [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.2018:368-377.
[100]CHI C Y,TSAI R T H,LAI J Y,et al.A ReinforcementLearning Approach to Emotion-based Automatic Playlist Gene-ration[C]//Proceedings of International Conference on Technologies and Applications of Artificial Intelligence.2010:60-65.
[101]ZENG C Q,WANG Q,MOKHTARI S.Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2016:2025-2034.
[102]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning [C]//Proceedings of the Workshops at the 26th Neural Information Processing Systems.2013:201-220.
[103]ZHU Y X,LV L Y.Evaluation Metrics for Recommender Systems[J].Journal of University of Electronic Science and Technology of China,2012,41(2):163-176.
[104]ZHANG S,YAO L N,SUN A X,et al.Deep learning basedrecommender system:a survey and new perspectives[J].ACM Computing Surveys,2019,52(1):1-38.
[1] 代珊珊, 刘全. 基于动作约束深度强化学习的安全自动驾驶方法[J]. 计算机科学, 2021, 48(9): 235-243.
[2] 吴少波, 傅启明, 陈建平, 吴宏杰, 陆悠. 基于相对熵的元逆强化学习方法[J]. 计算机科学, 2021, 48(9): 257-263.
[3] 张帆, 宫傲宇, 邓磊, 刘芳, 林艳, 张一晋. 面向实际信道观测环境的时限约束无线下行调度策略[J]. 计算机科学, 2021, 48(9): 264-270.
[4] 成昭炜, 沈航, 汪悦, 王敏, 白光伟. 基于深度强化学习的无人机辅助弹性视频多播机制[J]. 计算机科学, 2021, 48(9): 271-277.
[5] 周仕承, 刘京菊, 钟晓峰, 卢灿举. 基于深度强化学习的智能化渗透测试路径发现[J]. 计算机科学, 2021, 48(7): 40-46.
[6] 李贝贝, 宋佳芮, 杜卿芸, 何俊江. DRL-IDS:基于深度强化学习的工业物联网入侵检测系统[J]. 计算机科学, 2021, 48(7): 47-54.
[7] 詹皖江, 洪植林, 方路平, 吴哲夫, 吕跃华. 基于对抗性学习的协同过滤推荐算法[J]. 计算机科学, 2021, 48(7): 172-177.
[8] 梁俊斌, 张海涵, 蒋婵, 王天舒. 移动边缘计算中基于深度强化学习的任务卸载研究进展[J]. 计算机科学, 2021, 48(7): 316-323.
[9] 王英恺, 王青山. 能量收集无线通信系统中基于强化学习的能量分配策略[J]. 计算机科学, 2021, 48(7): 333-339.
[10] 房婷, 宫傲宇, 张帆, 林艳, 贾林琼, 张一晋. 一种传输时限下认知无线电网络的动态广播策略[J]. 计算机科学, 2021, 48(7): 340-346.
[11] 胡潇炜, 陈羽中. 一种结合自编码器与强化学习的查询推荐方法[J]. 计算机科学, 2021, 48(6A): 206-212.
[12] 陆嘉猷, 凌兴宏, 刘全, 朱斐. 基于自适应调节策略熵的元强化学习算法[J]. 计算机科学, 2021, 48(6): 168-174.
[13] 范家宽, 王皓月, 赵生宇, 周添一, 王伟. 数据驱动的开源贡献度量化评估与持续优化方法[J]. 计算机科学, 2021, 48(5): 45-50.
[14] 余笙, 李斌, 孙小兵, 薄莉莉, 周澄. 知识驱动的相似缺陷报告推荐方法[J]. 计算机科学, 2021, 48(5): 91-98.
[15] 范艳芳, 袁爽, 蔡英, 陈若愚. 车载边缘计算中基于深度强化学习的协同计算卸载方案[J]. 计算机科学, 2021, 48(5): 270-276.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[2] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[3] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[4] 杨羽琦,章国安,金喜龙. 车载自组织网络中基于车辆密度的双簇头路由协议[J]. 计算机科学, 2018, 45(4): 126 -130 .
[5] 施超,谢在鹏,柳晗,吕鑫. 基于稳定匹配的容器部署策略的优化[J]. 计算机科学, 2018, 45(4): 131 -136 .
[6] 夏庆勋,庄毅. 一种基于局部性原理的远程验证机制[J]. 计算机科学, 2018, 45(4): 148 -151 .
[7] 厉柏伸,李领治,孙涌,朱艳琴. 基于伪梯度提升决策树的内网防御算法[J]. 计算机科学, 2018, 45(4): 157 -162 .
[8] 战芸娇,魏欧,胡军. 面向DO-178C的襟缝翼控制系统需求的形式化描述[J]. 计算机科学, 2018, 45(4): 196 -202 .
[9] 李昊阳,符云清. 基于标签聚类与项目主题的协同过滤推荐算法[J]. 计算机科学, 2018, 45(4): 247 -251 .
[10] 魏芹双,武优西,刘靖宇,朱怀忠. 基于密度约束和间隙约束的对比模式挖掘[J]. 计算机科学, 2018, 45(4): 252 -256 .