Computer Science ›› 2026, Vol. 53 ›› Issue (1): 51-57.doi: 10.11896/jsjkx.250800033
• Research and Application of Large Language Model Technology • Previous Articles Next Articles
WAN Shenghua, XU Xingye, GAN Le, ZHAN Dechuan
CLC Number:
| [1]MOERLAND T M,BROEKENS J,PLAAT A,et al.Model-based reinforcement learning:A survey[J].Foundations and Trends© in Machine Learning,2023,16(1):1-118. [2]LUO F,XU T,LAI H,et al.A survey on model-based reinforcement learning[J].Science China(Information Sciences),2024(2):067. [3]HA D,SCHMIDHUBER J.Recurrent world models facilitatepolicy evolution[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:2455-2467. [4]HAFNER D,LILLICRAP T P,BA J,et al.Dream to control:Learning behaviors by latent imagination[C]//International Conference on Learning Representations.2020. [5]HAFNER D,LILLICRAP T P,NOROUZI M,et al.Mastering atari with discrete world models[C]//International Conference on Learning Representations.2021. [6]HU A,RUSSELL L,YEO H,et al.GAIA-1:A generative world model for autonomous driving[J].arXiv:2309.17080,2023. [7]BAI C,XU H,LI X.Embodied-AI with large models:research and challenges[J].Science China(Information Sciences),2024,54:2035-2082. [8]HANSEN N,SU H,WANG X.TD-MPC2:Scalable,robustworld models for continuous control[C]//The Twelfth Intarnational Conference on Learning Representations.Vienna,Austria,2024. [9]XU Y,PARKER-HOLDER J,PACCHIANO A,et al.Learning general world models in a handful of reward-free deployments[J].Advances in Neural Information Processing Systems,2022,35:26820-26838. [10]FENG Y,HANSEN N,XIONG Z,et al.Finetuning offline worldmodels in the real world[C]//Conference on Robot Learning.PMLR,2023:425-445. [11]SHAH S,DEY D,LOVETT C,et al.Airsim:High-fidelity visu-al and physical simulation for autonomous vehicles[C]//Field and Service Robotics:Results of the 11th International Confe-rence.Springer International Publishing,2018:621-635. [12]CHEN X,JIANG S,XU F,et al.Cross-modal domain adaptation for cost-efficient visual reinforcement learning[J].Advances in Neural Information Processing Systems,2021,34:12520-12532. [13]LIN Q,YU C,WU X,et al.Survey on Sim-to-real Transfer Reinforcement Learning in Robot Systems[J].Journal of Software,2024,35(2):711-738. [14]MA W,LI S,CAI L,et al.Learning modality knowledge alignment for cross-modality transfer[C]//Proceedings of the 41st International Conference on Machine Learning.2024:33777-33793. [15]SEO Y,LEE K,JAMESS S L,et al.Reinforcement learning with action-free pre-training from videos[C]//Proceedings of Inte-rnational Conference on Machine Learning.PMLR,2022:19561-19579. [16]ZHANG L,KAN M,SHAN S,et al.PreLAR:World model pre-training with learnable action representation[C]//European Conference on Computer Vision.Cham:Springer Nature Swit-zerland,2024:185-201. [17]KINGMA D P,WELLING M.Auto-encoding variational bayes[C]//International Conference on Learning Representations.2014. [18]WU J,YIN S,FENG N,et al.iVideoGPT:Interactive videogpts are scalable world models[J].Advances in Neural Information Processing Systems,2024,37:68082-68119. [19]MICHELI V,ALONSO E,FLEURET F.Transformers aresample-efficient world models[C]//The Eleventh International Conference on Learning Representations.2023. [20]ZHANG W,WANG G,SUN J,et al.Storm:Efficient stochastic transformer based world models for reinforcement learning[J].Advances in Neural Information Processing Systems,2023,36:27147-27166. [21]ROBINE J,HOFTMANN M,UELWER T,et al.Transformer-based World Models Are Happy With 100k Interactions[C]//The Eleventh International Conference on Learning Representations.2023. [22]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st Internatioan Confe-rence on Neural Information Processing Systems.2017:6000-6010. [23]BELLEMARE M G,NADDAF Y,VENESS J,et al.The arcade learning environment:An evaluation platform for general agents[J].Journal of Artificial Intelligence Research,2013,47:253-279. [24]DENG F,PARK J,AHN S.Facing off world model backbones:Rnns,transformers,and s4[J].Advances in Neural Information Processing Systems,2023,36:72904-72930. [25]SOHL-DICKSTEIN J,WEISS E,MAHESWARANATHANN,et al.Deep unsupervised learning using nonequilibrium thermodynamics[C]//Proceedings of International Conference on Machine Learning.PMLR,2015:2256-2265. [26]ALONSO E,JELLEY A,MICHELI V,et al.Diffusion for world modeling:Visual details matter in atari[J].Advances in Neural Information Processing Systems,2024,37:58757-58791. [27]DING Z,ZHANG A,TIAN Y,et al.Diffusion world model:Future modeling beyond step-by-step rollout for offline reinforcement learning[J].arXiv:2402.03570,2024. [28]WU J,MA H,DENG C,et al.Pre-training contextualized world models with in-the-wild videos for reinforcement learning[J].Advances in Neural Information Processing Systems,2023,36:39719-39743. [29]LU C,SCHROECKER Y,GU A,et al.Structured state spacemodels for in-context reinforcement learning[J].Advances in Neural Information Processing Systems,2023,36:47016-47031. [30]KAELBLING L P,LITTMAN M L,CASSANDRA A R.Planning and acting in partially observable stochastic domains[J].Artificial Intelligence,1998,101(1/2):99-134. [31]QWEN TEAM.Qwen2.5-VL[EB/OL].[2025-01-31].https://qwenlm.github.io/blog/qwen2.5-vl/. [32]VAN DEN OORD A,VINYALS O.Neural discrete representation learning[C]//NIPS.2017. [33]TASSA Y,DORON Y,MULDAL A,et al.Deepmind control suite[J].arXiv:1801.00690,2018. [34]YU T,QUILLEN D,HE Z,et al.Meta-world:A benchmark and evaluation for multi-task and meta reinforcement learning[C]//Conference on Robot Learning.PMLR,2020:1094-1100. [35]GOYAL R,EBRAHIMI KAHOU S,MICHALSKI V,et al.The “something something” video database for learning and evaluating visual common sense[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5842-5850. [36]IONESCU C,PAPAVA D,OLARU V,et al.Human3.6m:Large scale datasets and predictive methods for 3d human sensing in natural environments[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,36(7):1325-1339. |
| [1] | DUAN Pengting, WEN Chao, WANG Baoping, WANG Zhenni. Collaborative Semantics Fusion for Multi-agent Behavior Decision-making [J]. Computer Science, 2026, 53(1): 252-261. |
| [2] | WANG Haoyan, LI Chongshou, LI Tianrui. Reinforcement Learning Method for Solving Flexible Job Shop Scheduling Problem Based onDouble Layer Attention Network [J]. Computer Science, 2026, 53(1): 231-240. |
| [3] | ZHU Shihao, PENG Kexing, MA Tinghuai. Graph Attention-based Grouped Multi-agent Reinforcement Learning Method [J]. Computer Science, 2025, 52(9): 330-336. |
| [4] | CHEN Jintao, LIN Bing, LIN Song, CHEN Jing, CHEN Xing. Dynamic Pricing and Energy Scheduling Strategy for Photovoltaic Storage Charging Stations Based on Multi-agent Deep Reinforcement Learning [J]. Computer Science, 2025, 52(9): 337-345. |
| [5] | ZHANG Yongliang, LI Ziwen, XU Jiahao, JIANG Yuchen, CUI Ying. Congestion-aware and Cached Communication for Multi-agent Pathfinding [J]. Computer Science, 2025, 52(8): 317-325. |
| [6] | HUO Dan, YU Fuping, SHEN Di, HAN Xueyan. Research on Multi-machine Conflict Resolution Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(7): 271-278. |
| [7] | PIAO Mingjie, ZHANG Dongdong, LU Hu, LI Rupeng, GE Xiaoli. Study on Multi-agent Supply Chain Inventory Management Method Based on Improved Transformer [J]. Computer Science, 2025, 52(6A): 240500054-10. |
| [8] | XU Dan, WANG Jiangtao. Design of Autonomous Decision for Trajectory Optimization of Intelligent Morphing Aircraft [J]. Computer Science, 2025, 52(6A): 240600068-7. |
| [9] | WU Zongming, CAO Jijun, TANG Qiang. Online Parallel SDN Routing Optimization Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(6A): 240900018-9. |
| [10] | ZHAO Chanchan, YANG Xingchen, SHI Bao, LYU Fei, LIU Libin. Optimization Strategy of Task Offloading Based on Meta Reinforcement Learning [J]. Computer Science, 2025, 52(6A): 240800050-8. |
| [11] | ZHAO Xuejian, YE Hao, LI Hao, SUN Zhixin. Multi-AGV Path Planning Algorithm Based on Improved DDPG [J]. Computer Science, 2025, 52(6): 306-315. |
| [12] | WANG Chenyuan, ZHANG Yanmei, YUAN Guan. Class Integration Test Order Generation Approach Fused with Deep Reinforcement Learning andGraph Convolutional Neural Network [J]. Computer Science, 2025, 52(6): 58-65. |
| [13] | LI Yuanbo, HU Hongchao, YANG Xiaohan, GUO Wei, LIU Wenyan. Intrusion Tolerance Scheduling Algorithm for Microservice Workflow Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(5): 375-383. |
| [14] | HAN Lin, WANG Yifan, LI Jianan, GAO Wei. Automatic Scheduling Search Optimization Method Based on TVM [J]. Computer Science, 2025, 52(3): 268-276. |
| [15] | ZHENG Longhai, XIAO Bohuai, YAO Zewei, CHEN Xing, MO Yuchang. Graph Reinforcement Learning Based Multi-edge Cooperative Load Balancing Method [J]. Computer Science, 2025, 52(3): 338-348. |
|
||