基于强化学习的推荐研究综述

doi:10.11896/jsjkx.210200085

Abstract

Abstract: Recommender systems are devoted to find and automatically recommend valuable information and services for users from massive data,which can effectively solve the information overload problem,and become an important information technology in the era of big data.However,the problems of data sparsity,cold start,and interpretability are still the key technical difficulties that limit the wide application of the recommender systems.Reinforcement learning is an interactive learning technique,which can dynamically model user preferences by interacting with users and obtaining feedback to capture their interest drift in real time,and can better solve the classical key issues faced by traditional recommender systems.Nowadays,reinforcement lear-ning has become a hot research topic in the field of recommendation systems.From the perspective of survey,this paper first analyzes the improvement ideas of reinforcement learning for recommender systems based on a brief review of recommender systems and reinforcement learning.Then,the paper makes a general overview and summary of reinforcement learning based recommender systems in recent years,and further summarizes the research situation of traditional reinforcement learning based recommendation and deep reinforcement learning based recommendation respectively.Furthermore,the paper summarizes the frontiers of reinforcement learning based recommendation research topic in recent years and its application.Finally,the future development trend and application of reinforcement learning in recommender systems are analyzed.

Key words: Deep reinforcement learning, Markov decision process, Multiple arm bandits, Recommender systems, Reinforcement learning

CLC Number:

TP183

YU Li, DU Qi-han, YUE Bo-yan, XIANG Jun-yao, XU Guan-yu, LENG You-fang. Survey of Reinforcement Learning Based Recommender Systems[J].Computer Science, 2021, 48(10): 1-18.

References

[1]MARZ N,WARREN J.Big Data:Principles and best practices of scalable realtime data systems [M].USA:Manning,2015:44-49.
[2]KOREN Y,BELL R,VOLINSKY C.Matrix factorization techniques for recommender systems[J].Computer,2009,42(8):30-37.
[3]BOBADILLA J,ORTEGA F,HERNANDO A,et al.Recommender systems survey[J].Knowledge Based Systems,2013,46:109-132.
[4]HUANG L W,JIANG B T,LV S Y,et al.Survey on deep lear-ning based recommender systems[J].Chinese Journal of Compu-ters,2018,41(7):1619-1647.
[5]BATMAZ Z,YUREKLI A,BILGE A,et al.A review on deep learning for recommender systems:challenges and remedies[J].Artificial Intelligence Review,2019,52(1):1-37.
[6]ZHAO X Y,XIA L,TANG J L,et al.Deep ReinforcementLearning for Search,Recommendation and Online Advertising:A Survey[J].ACM SIGWEB Newsletter,2019 (Spring):1-15.
[7]LIU Q,ZHAI J W,ZHANG Z Z,et al.A Survey on Deep Reinforcement Learning[J].Chinese Journal of Computers,2018,41(1):1-27.
[8]ZHAO X X,ZHANG W N,WANG J.Interactive collaborative filtering [C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management.ACM Press,2013:1411-1420.
[9]ZHAO X Y,ZHANG L,DING Z Y,et al.Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:1040-1048.
[10]WAN L P,LAN X G,ZHANG H B.The theory and application of deep reinforcement learning[J].Pattern Recognition and Artificial Intelligence,2019,32(1):67-81.
[11]SARWAR B M,KARYPIS G,KONSTAN J A,et al.Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th International Conference on World Wide Web.2001:285-295.
[12]VAN M R,VAN S M.Using content-based filtering for recommendation[C]//Proceedings of the Workshop on Machine Learning in The New Information Age.2000:47-56.
[13]AN M X,WU F Z,WU C H.Neural News Recommendation with Long and Short-term User Representations [C]//The 57th Annual Meeting of the Association for Computational Linguistics.2019:336-345.
[14]MA J Q,ZHAO Z,YI X Y.Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:1930-1939.
[15]DU W,DING S F.A survey of Multi-Agent ReinforcementLearning[J].Computer Science,2019,46(8):1-8.
[16]LIN X,CHEN H J.A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation [C]//Proceedings of the 13th ACM Conference on Recommender Systems.2019:20-28.
[17]GE T Z,ZHAO L Q.Image Matters:Visually modeling user behaviors using Advanced Model Server [C]//Proceedings of the 27th ACM International Conference on Information and Know-ledge Management.2018:2087-2095.
[18]GUO Q Y,ZHUANG F Z,QIN C,et al.A survey on knowledge graph-based recommender systems[J].IEEE Transactions on Knowledge and Data Engineering,2020,50(7):937-957.
[19]YUE Y S,GUESTRIN G.Linear submodular bandits and their application to diversified retrieval [C]//Neural Information Processing Systems.2011:2483-2491.
[20]SHANI G,HECKERMAN D,BRAFMAN R I.An MDP-based recommender system[J].Journal of Machine Learning Research,2005,6(9):1265-1295.
[21]AUER P.Using confidence bounds for exploitation-exploration trade-offs[J].Journal of Machine Learning Research,2002,3(1):397-422.
[22]AGRAWAL S,GOYAL N.Analysis of thompson sampling for the multi-armed bandit problem [C]//Proceedings of the 25th Annual Conference on Learning Theory.2012:1-26.
[23]BOUNEFFOUF D,BOUZEGHOUB A,GANCARSKI A L.A contextual-bandit algorithm for mobile context-aware recommender system [C]//Neural Information Processing.2012:324-331.
[24]LI L H,CHU W,LANGFORD J,et al.A Contextual-Bandit Approach to Personalized News Article Recommendation [C]//Proceedings of the 19th International Conference on World Wide Web.Raleigh,2010:661-670.
[25]ALLESIARDO R,FERAUD R,BOUNEFFOUF D.A neural networks committee for the contextual bandit problem [C]//International Conference on Neural Information Processing.2014:374-381.
[26]AGRAWAL S,GOYAL N.Thompson sampling for contextual bandits with linear payoffs[C]//International Conference on Machine Learning.2013:127-135.
[27]LIU J W,GAO F,LUO X L.A survey of deep reinforcement learning based on value function and strategy gradient[J].Journal of Computer Science,2019,42(6):1406-1438.
[28]MNIHL V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-542.
[29]VAN H H,GUEZAND A,SILVER D.Deep ReinforcementLearning with Double Q-learning [C]//Proceedings of AAAI Conference on Artificial Intelligence.2016:2094-2110.
[30]WANG Z Y,SCHAUL T,HESSEL M,et al.Dueling Network Architectures for Deep Reinforcement Learning[C]//International Conference on Machine Learning.2016:1995-2003.
[31]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized Experience Replay [C]//Proceedings of International Conference on Learning Representations.2016:1-21.
[32]FORTUNATO M,AZARM G,PIOT B,et al.Noisy networksfor exploration[J].arXiv:1706.10295,2017.
[33]BELLEMARE M G,DABNEY W,MUNOS R.A distributional perspective on reinforcement learning [C]//International Conference on Machine Learning.2017:449-458.
[34]HESSEL M,MODAYIL J,VAN H H,et al.Rainbow:Combining Improvements in Deep Reinforcement Learning [C]//Proceedings of Association for the Advancement of Artificial Intelligence.2018:3215-3222.
[35]SILVER D,LEVER G,HEESS N,et al.Deterministic PolicyGradient Algorithms [C]//International Conference on Machine Learning.2014:387-395.
[36]KULKARNI T D,NARASIMHAN K R,SAEEDI A,et al.Hierarchical Deep Reinforcement Learning:Integrating Temporal Abstraction and Intrinsic Motivation [C]//Proceedings of Thirtieth Conference on Neural Information Processing Systems.2016:1-9.
[37]ZHENG G J,ZHANG F Z,ZHENG Z H,et al.DRN:A Deep Reinforcement Learning Framework for News Recommendation [C]//Proceedings of the 2018 World Wide Web Conference.2018:167-176.
[38]SHANI G,GUNAWARDANA A.Evaluating recommendation systems[M]//Recommender Systems Handbook.Boston:Springer,2011:257-297.
[39]WANG X X,WANG Y,HSU D,et al.Exploration in Interactive Personalized Music Recommendation:A Reinforcement Learning Approach[J].ACM Transactions on Multimedia Computing,Communications,and Applications,2014,11(1):1-22.
[40]WU Q Y,WANG H Z,GU Q Q,et al.Contextual Bandits in a Collaborative Environment [C]//Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval.2016:529-538.
[41]BRODEN B,HAMMAR M,NILSSON B J,et al.Ensemble Re-commendations via Thompson Sampling:an Experimental Study within e-Commerce [C]//Proceedings of the 2018 Conference on Human Information Interaction & Retrieval.2018:19-29.
[42]WANG H Z,WU Q Y,WANG H N.Factorization Bandits forInteractive Recommendation [C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.2017:2695-2702.
[43]INTAYOAD W,KAMYOD C,TEMDEE P.ReinforcementLearning Based on Contextual Bandits for Personalized Online Learning Recommendation Systems[J].Wireless Personal Communications,2020(115):2917-2932.
[44]SHEN Y L,DENG Y,RAY A,et al.Interactive recommendation via deep neural memory augmented contextual bandits [C]//Proceedings of the 12th ACM Conference on Recommender Systems.2018:122-130.
[45]ZHANG Y,ZHANG C W,LIU X Z.Dynamic Scholarly Colla-borator Recommendation via Competitive Multi-Agent Reinforcement Learning [C]//Proceedings of the Eleventh ACM Confe-rence on Recommender Systems.2017:331-335.
[46]LIEBMAN E,SAAR T M,STONE P.DJ-MC:A Reinforce-ment-Learning Agent for Music Playlist Recommendation [C]//Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems.2015:591-599.
[47]CHOI S,HA H,HWANG U,et al.Reinforcement Learningbased Recommender System using Biclustering Technique[J].arXiv:1801.05532,2018.
[48]DE N F,THEOCHAROUS G,VLASSIS N,et al.Capacity-aware Sequential Recommendations [C]//Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems.2018:416-424.
[49]LIU W,LIU F,TANG R,et al.Balancing Between Accuracy and Fairness for Interactive Recommendation with Reinforcement Learning[C]//Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining.2020:155-167.
[50]LIU F,TANG R,GUO H,et al.Top-aware reinforcement lear-ning based recommendation[J].Neurocomputing,2020,417:255-269.
[51]CHEN S Y,YU Y,DA Q,et al.Stabilizing ReinforcementLearning in Dynamic Environment with Application to Online Recommendation [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.2018:1187-1196.
[52]ZOU L X,XIA L,DING Z Y,et al.Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2019:2810-2818.
[53]CHANG H P,YANG X R,CUI Q,et al.Value-aware Recommendation based on Reinforcement Profit Maximization [C]//Proceedings of the 2019 World Wide Web Conference.2019:3123-3129.
[54]EUGENE I,JAIN V,WANG J,et al.Slate Q:a tractable decomposition for reinforcement learning with recommendation sets [C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.2019:2592-2599.
[55]ZOU L,XIA L,DU P,et al.Pseudo dyna-q:a reinforcement learning framework for interactive recommendation [C]//Proceedings of the 13th International Conference on Web Search and Data Mining.2020:816-824.
[56]LEI Y,LI W J.Interactive Recommendation with User-Specific Deep Reinforcement Learning[J].ACM Transactions on Know-ledge Discovery from Data,2019,13(6):1-15.
[57]LEI Y,PEI H,YAN H,et al.Reinforcement learning based re-commendation with graph convolutional q-network [C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:1757-1760.
[58]ZHANG Y T,CHEN R,TANG J,et al.LEAP:Learning to Prescribe Effective and Safe Treatment Combinations for Multimorbidity [C]//Proceedings of the 23rd ACM SIGKDD Internatio-nal Conferenceon Knowledge Discovery and Data Mining.2017:1315-1324.
[59]ZHAO W,WANG W Y,YE J B,et al.Leveraging Long and Short-Term Information in Content-Aware Movie Recommendation via Adversarial Training[J].IEEE Transactions on Cybernetics,2019,50(11):4680-4693.
[60]SUN Y M,ZHANG Y.Conversational Recommender System [C]//Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval.2018:235-244.
[61]CHEN M M,BEUTEL A,COVINGTON P,et al.Top-K Off-Policy Correction for a REINFORCE Recommender System [C]//Proceedings of the Twelfth ACM International Confe-rence on Web Search and Data Mining.2019:456-464.
[62]CHEN H K,DAI X Y,CAI H,et al.Large-scale interactive re-commendation with tree-structured policy gradient[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2019:3312-3320.
[63]PAN F Y,CAI Q P,TANG P Z,et al.Policy gradients for contextual recommendations [C]//Proceedings of The World Wide Web Conference.2019:1421-1431.
[64]BAI X Y,GUAN J,WANG H N.A model-based reinforcement learning with adversarial training for online recommendation [C]//Proceedings of the 33rd Conference on Neural Information Processing Systems.2019:1-12.
[65]WANG L,ZHANG W,HE X F.Supervised ReinforcementLearning with Recurrent Neural Network for Dynamic Treatment Recommendation [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:2447-2456.
[66]ZHAO X Y,XIA L,ZHANG L,et al.Deep ReinforcementLearning for List-wise Recommendations[J].arXiv:1801.00209,2017.
[67]ZHAO X Y,XIA L,ZHANG L,et al.Deep ReinforcementLearning for Page-wise Recommendations [C]//Proceedings of the 12th ACM Conference on Recommender Systems.2018:95-103.
[68]ZHANG R Y,YU T,SHEN Y L,et al.Text-based interactive recommendation via constraint-augmented reinforcement lear-ning[C]//Proceedings of the 33rd Conference on Neural Information Processing Systems.2019:13-24.
[69]YU T,SHEN Y L,ZHANG R Y,et al.Vision-Language Re-commendation via Attribute Augmented Multimodal Reinforcement Learning [C]//Proceedings of the 27th ACM Internatio-nal Conference on Multimedia.2019:39-47.
[70]WANG P,FAN Y,XIA L,et al.KERL:A knowledge-guidedreinforcement learning model for sequential recommendation [C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:209-218.
[71]CHEN X,HUANG C,YAO L,et al.Knowledge-guided deepreinforcement learning for interactive recommendation [C]//Proceedings of the 2020 International Joint Conference on Neural Networks.2020:1-8.
[72]ZHAO X Y,XIA L,ZHANG L,et al.Model-Based Reinforcement Learning for Whole-Chain Recommendations[J].arXiv:1902.03987,2019.
[73]CHEN X S,LI S,LI H,et al.Generative Adversarial UserModel for Reinforcement Learning Based Recommendation System [C]//Proceedings of the 34th International Conference on Machine Learning.2019:1052-1061.
[74]GAO R,XIA H F,LI J,et al.DRCGR:Deep ReinforcementLearning Framework Incorporating CNN and GAN-Based for Interactive Recommendation [C]//Proceedings of the 2019 IEEE International Conference on Data Mining.2019:1048-1053.
[75]WU H J,DAI D D,FU Q M.Research progress on the combination of reinforcement learning and generative adversary network[J].Journal of Computer Engineering and Application,2019,55(10):41-49.
[76]LIN J H,ZHANG Z C,JIANG C.A survey of imitation learning based on generative adversary network[J].Journal of Computer Science,2020,43(2):326-351.
[77]ZHAO D Y,ZHANG L,ZHANG B,et al.MaHRL:Multi-goals Abstraction Based Deep Hierarchical Reinforcement Learning for Recommendations [C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:871-880.
[78]XIE R B,ZHANG S L,WANG R,et al.Hierarchical Reinforcement Learning for Integrated Recommendation [C]//Procee-dings of the 35th AAAI Conference on Artificial Intelligence.2021:1-8.
[79]ZHANG J,HAO B W,CHEN B,et al.Hierarchical reinforcement learning for course recommendation in MOOCs [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:435-442.
[80]FENG J,LI H,HUANG M,et al.Learning to collaborate:Multi-scenario ranking via multi-agent reinforcement learning [C]//Proceedings of the World Wide Web Conference.2018:1939-1948.
[81]HE X,AN B,LI Y,et al.Learning to Collaborate in Multi-Module Recommendation via Multi-Agent Reinforcement Lear-ning without Communication[C]//Proceedings of the Fourteenth ACM Conference on Recommender Systems.2020:210-219.
[82]GUI T,LIU P,ZHANG Q,et al.Mention Recommendation in Twitter with Cooperative Multi-Agent Reinforcement Learning [C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:535-544.
[83]SHI J C,YU Y,DA Q,et al.Virtual-Taobao:virtualizing real-world online retail environment for reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019,33(1):4902-4909.
[84]SHANG W J,YU Y,LI Q Y,et al.Environment reconstruction with hidden confounders for reinforcement learning based re-commendation [C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.2019:566-576.
[85]ZHAO X Y,XIA L,ZOU L X,et al.Toward simulating environments in reinforcement learning based recommendations[J].arXiv:1906.11462,2019.
[86]ROHDE D,BONNER S,DUNLOP T,et al.RecoGym:a reinforcement learning environment for the problem of product re-commendation in online advertising[J].arXiv:1808.00720,2018.
[87]SHI B,OZSOY M G,HURLEY N,et al.PyRecGym:a rein-forcement learning gym for recommender systems [C]//Proceedings of the 13th ACM Conference on Recommender Systems.2019:491-495.
[88]EUGENE I,HSU C,MLADENOV M,et al.RecSim:a configurablesimulation platform for recommender systems[J].arXiv:1909.04847,2019.
[89]WANG X T,CHEN Y R,JIE Y,et al.A reinforcement learning framework for explainable recommendation [C]//Proceedings of the 2018 IEEE International Conference on Data Mining.2018:587-596.
[90]XIAN Y K,FU Z H,MUTHUKRISHNAN S.Reinforcement knowledge graph reasoning for explainable recommendation [C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:285-294.
[91]MCINERNEY J,LACKER B,HANSEN S,et al.Explore,ex-ploit,and explain:personalizing explainable recommendations with bandits [C]//Proceedings of the 12th ACM Conference on Recommender Systems.2018:31-39.
[92]LEI Y,WANG Z T,LI W J.Social attentive deep q-network for recommendation [C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:1189-1192.
[93]LIU F,GUO H F,LI X T,et al.End-to-end deep reinforcement learning based recommendation with supervised embedding [C]//Proceedings of the 13th International Conference on Web Search and Data Mining.2020:384-392.
[94]WANG Y.A Hybrid Recommendation for Music Based on Reinforcement Learning [C]//Pacific-Asia Conference on Know-ledge Discovery and Data.2020:91-103.
[95]HONG D,LI Y,DONG Q.Nonintrusive-Sensing and Reinforce-ment-Learning Based Adaptive Personalized Music Recommendation [C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:1721-1724.
[96]MASSIMO D,ELAHI M,RICCI F.Learning User Preferences by Observing User-Items Interactions in an IoT Augmented Space [C]//Adjunct Publication of the 25th Conference on User Modeling,Adaptation and Personalization.2017:35-40.
[97]ZHAO Y,ZENG D,SOCINSKI M A,et al.ReinforcementLearning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer[J].Journal of the International Biometric Society,2011,67(4):1422-1433.
[98]LU Z Q,YANG Q.Partially Observable Markov DecisionProcess for Recommender Systems[J].arXiv:1608.07793,2016.
[99]HU Y J,DA Q,ZENG A X,et al.Reinforcement Learning to Rank in E-Commerce Search Engine:Formalization,Analysis,and Application [C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.2018:368-377.
[100]CHI C Y,TSAI R T H,LAI J Y,et al.A ReinforcementLearning Approach to Emotion-based Automatic Playlist Gene-ration[C]//Proceedings of International Conference on Technologies and Applications of Artificial Intelligence.2010:60-65.
[101]ZENG C Q,WANG Q,MOKHTARI S.Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2016:2025-2034.
[102]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with Deep Reinforcement Learning [C]//Proceedings of the Workshops at the 26th Neural Information Processing Systems.2013:201-220.
[103]ZHU Y X,LV L Y.Evaluation Metrics for Recommender Systems[J].Journal of University of Electronic Science and Technology of China,2012,41(2):163-176.
[104]ZHANG S,YAO L N,SUN A X,et al.Deep learning basedrecommender system:a survey and new perspectives[J].ACM Computing Surveys,2019,52(1):1-38.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Survey of Reinforcement Learning Based Recommender Systems

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0

[1]	LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241.
[2]	YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204.
[3]	SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[4]	YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253.
[5]	LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279.
[6]	GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318.
[7]	FAN Jing-yu, LIU Quan. Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning [J]. Computer Science, 2022, 49(6): 335-341.
[8]	XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11.
[9]	HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157.
[10]	CHEN Zhuang, ZOU Hai-tao, ZHENG Shang, YU Hua-long, GAO Shang. Diversity Recommendation Algorithm Based on User Coverage and Rating Differences [J]. Computer Science, 2022, 49(5): 159-164.
[11]	ZHANG Jia-neng, LI Hui, WU Hao-lin, WANG Zhuang. Exploration and Exploitation Balanced Experience Replay [J]. Computer Science, 2022, 49(5): 179-185.
[12]	LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268.
[13]	OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51.
[14]	ZHOU Qin, LUO Fei, DING Wei-chao, GU Chun-hua, ZHENG Shuai. Double Speedy Q-Learning Based on Successive Over Relaxation [J]. Computer Science, 2022, 49(3): 239-245.
[15]	LI Su, SONG Bao-yan, LI Dong, WANG Jun-lu. Composite Blockchain Associated Event Tracing Method for Financial Activities [J]. Computer Science, 2022, 49(3): 346-353.