Computer Science ›› 2026, Vol. 53 ›› Issue (3): 23-32.doi: 10.11896/jsjkx.250900173
• Intelligent Information System Based on AGI Technology • Previous Articles Next Articles
WANG Zhibin1, LI Shipeng1,2, ZHOU Yuhang1, LI Xue2, ZHANG Zhonghui1, JIANG Zhiwei1, GU Rong1, TIAN Chen1, CHEN Guihai1, ZHONG Sheng1
CLC Number:
| [1]TOUVRON H,LAVRIL T,IZACARD G,et al.Llama:Openand efficient foundation language models[J].arXiv:2302.13971,2023. [2]WAKE A,CHEN B,LYU C X,et al.Yi-Lightning Technical Report[J].arXiv:2412.01253,2024. [3]Google-DeepMind.Gemini 2.0[EB/OL].[2025-10-15].https://deepmind.google/technologies/gemini/. [4]xAI.Bringing Grok to Everyone[EB/OL].[2025-10-15].https://x.ai/. [5]OpenAI.ChatGPT[EB/OL].[2025-10-15].https://chat.-openai.com. [6]ZHENG L,CHIANG W L,SHENG Y,et al.Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena[C]//Advances in Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2023:46595-46623. [7]MONTAGNA S,FERRETTI S,KLOPFENSTEIN L C,et al.Data decentralisation of LLM-based chatbot systems in chronic disease self-management[C]//Proceedings of the 2023 ACM Conference on Information Technology for Social Good.New York:ACM,2023:205-212. [8]VU M D,WANG H,LI Z,et al.GPTVoiceTasker:LLM-po-wered virtual assistant for smartphone[J].arXiv:2401.14268,2024. [9]DONG X L,MOON S,XU Y E,et al.Towards next-generation intelligent assistants leveraging llm techniques[C]//Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.New York:ACM,2023:5792-5793. [10]YU G I,JEONG J S,KIM G W,et al.Orca:A distributed serving system for {Transformer-Based} generative models[C]//16th USENIX Symposium on Operating Systems Design and Implementation(OSDI 22).Carlsbad,CA:USENIX Association,2022:521-538. [11]KWON W,LI Z,ZHUANG S,et al.Efficient Memory Management for Large Language Model Serving with PagedAttention[C]//Proceedings of the 29th Symposium on Operating Systems Principles.New York:ACM,2023:611-626. [12]CHENG K,HU W,WANG Z,et al.Enabling efficient batchserving for lmaas via generation length prediction[C]//2024 IEEE International Conference on Web Services(ICWS).IEEE,2024:853-864. [13]ZHANG P,SU L,YANG J,et al.Topology-aware PreemptiveScheduling for Co-located LLM Workloads[J].arXiv:2411.11560,2024. [14]ZHU K,ZHAO Y,ZHAO L,et al.NanoFlow:Towards Optimal Large Language Model Serving Throughput[J].arXiv:2408.12757,2024. [15]GUO D,YANG D,ZHANG H,et al.DeepSeek-R1:Incentivizing Reasoning Capability in LLMs via Reinforcement Learning[J].arXiv:2501.12948,2025. [16]JAECH A,KALAI A,LERER A,et al.OpenAI o1 System Card[J].arXiv:2412.16720,2024. [17]ONG I,ALMAHAIRI A,WU V,et al.RouteLLM:Learning to Route LLMs with Preference Data[J].arXiv:2406.18665,2024. [18]PATEL P,CHOUKSE E,ZHANG C,et al.Splitwise:Efficient Generative LLM Inference Using Phase Splitting[C]//2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture(ISCA).IEEE,2024:118-132. [19]AGRAWAL A,KEDIA N,PANWAR A,et al.Taming {Throu-ghput-Latency} tradeoff in {LLM} inference with {Sarathi-Serve}[C]//18th USENIX Symposium on Operating Systems Design and Implementation(OSDI 24).Santa Clara,CA:USENIX Association,2024:117-134. [20]ZHONG Y,LIU S,CHEN J,et al.DistServe:DisaggregatingPrefill and Decoding for Goodput-optimized Large Language Model Serving[J].arXiv:2401.09670,2024. [21]CHENG K,WANG Z,HU W,et al.SCOOT:SLO-Oriented Performance Tuning for LLM Inference Engines[J].arXiv:2408.04323,2024. [22]PATKE A,REDDY D,JHA S,et al.One Queue Is All You Need:Resolving Head-of-Line Blocking in Large Language Model Serving[J].arXiv:2407.00047,2024. [23]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[J].arXiv:1706.03762,2017. [24]WEINREICH H,OBENDORF H,HERDER E,et al.Not quite the average:An empirical study of Web use[J].ACM Transactions on the Web,2008,2(1):1-31. [25]SKADBERG Y X,KIMMEL J R.Visitors’ flow experiencewhile browsing a Web site:its measurement,contributing factors and consequences[J].Computers in Human Behavior,2004,20(3):403-422. [26]EGGER S,HOSSFELD T,SCHATZ R,et al.Waiting times in quality of experience for web based services[C]//2012 Fourth International Workshop on Quality of Multimedia Experience.IEEE,2012:86-96. [27]HU C,HUANG H,XU L,et al.Inference without interference:Disaggregate llm inference for mixed downstream workloads[J].arXiv:2401.11181,2024. [28]SUN B,HUANG Z,ZHAO H,et al.Llumnix:Dynamic Scheduling for Large Language Model Serving[C]//18th USENIX Symposium on Operating Systems Design and Implementation(OSDI 24).Santa Clara,CA:USENIX Association,2024:173-191. [29]KOSSMANN F,FONTAINE B,KHUDIA D,et al.Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs[J].arXiv:2410.17840,2024. [30]WU B,ZHONG Y,ZHANG Z,et al.Fast Distributed Inference Serving for Large Language Models[J].arXiv:2305.05920,2023. [31]JIANG X,ZHOU Y,CAO S,et al.NEO:Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference[J].ar-Xiv:2411.01142,2024. [32]BRYSBAERT M.How many words do we read per minute? A review and meta-analysis of reading rate[J].Journal of Memory and Language,2019,109:104047. [33]QIAO Y,ANZAI S,YU S,et al.ConServe:Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving[J].arXiv:2410.01228,2024. [34]WANG Z,LI S,LI X,et al.Echo:Efficient Co-Scheduling of Hybrid Online-Offline Tasks for Large Language Model Serving[J].arXiv:2504.03651,2025. [35]ZHAO Y,YANG S,ZHU K,et al.BlendServe:Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching[J].arXiv:2411.16102,2024. [36]LIU J,WU Z,CHUNG J W,et al.Andes:Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Ser-vices[J].arXiv:2404.16283,2024. [37]AGRAWAL A,PANWAR A,MOHAN J,et al.Sarathi:Effi-cient llm inference by piggybacking decodes with chunked prefills[J].arXiv:2308.16369,2023. [38]QIN R,LI Z,HE W,et al.Mooncake:A KVCache-centric Disaggregated Architecture for LLM Serving[J].arXiv:2407.00079,2024. [39]YANG A,YANG B S,ZHANG B C,et al.Qwen2.5 Technical Report[J].arXiv:2412.15115,2024. [40]ECCLESTON D.ShareGPT[DB/OL].[2025-10-15].https://github.com/domeccleston/sharegpt. [41]LI J,WANG M,ZHENG Z,et al.LooGLE:Can Long-Context Language Models Understand Long Contexts?[J].arXiv:2311.04939,2023. [42]ZHAO Y S,WANG Y D,JI M Y.Overview of reasoning with large language models based on thought chain prompts[J].Journal of Harbin Vocational and Technical College,2025(4):5-7. [43]SHEN Y,ZHANG J,HUANG J,et al.DAST:Difficulty-Adaptive Slow-Thinking for Large Reasoning Models[J].arXiv:2503.04472,2025. |
| [1] | ZHOU Yueyuan, LU Guanze, XIANG Jiawei, ZHANG Jiawei, SHAO En, HE Xin. Training System for Large Language Models Based on Adaptive Transpose on Hygon DCU [J]. Computer Science, 2026, 53(3): 33-40. |
| [2] | CHEN Han, XU Zefeng, JIANG Jiu, FAN Fan, ZHANG Junjian, HE Chu, WANG Wenwei. Large Language Model and Deep Network Based Cognitive Assessment Automatic Diagnosis [J]. Computer Science, 2026, 53(3): 41-51. |
| [3] | WU Xianjie, LI Tongliang, LI Zhoujun. Survey of Table Question Answering Research [J]. Computer Science, 2026, 53(3): 295-306. |
| [4] | XU Cheng, LIU Yuxuan, WANG Xin, ZHANG Cheng, YAO Dengfeng, YUAN Jiazheng. Review of Speech Disorder Assessment Methods Driven by Large Language Models [J]. Computer Science, 2026, 53(3): 307-320. |
| [5] | LI Wenli, FENG Xiaonian, QIAN Tieyun. Few-shot Continuous Toxicity Detection Based on Large Language Model Augmentation [J]. Computer Science, 2026, 53(3): 321-330. |
| [6] | CHEN Yuyin, LI Guanfeng, QIN Jing, XIAO Yuhang. Survey on Complex Logical Query Methods in Knowledge Graphs [J]. Computer Science, 2026, 53(2): 273-288. |
| [7] | GUO Luxiang, WANG Yueyu, LI Qianyue, LI Shasha, LIU Xiaodong, JI Bin, YU Jie. Comprehensive Survey of LLM-based Agent Operating Systems [J]. Computer Science, 2026, 53(1): 1-11. |
| [8] | LIU Lilong, LIU Guoming, QI Baoyuan, DENG Xueshan, XUE Dizhan, QIAN Shengsheng. Efficient Inference Techniques of Large Models in Real-world Applications:A Comprehensive Survey [J]. Computer Science, 2026, 53(1): 12-28. |
| [9] | SHAO Xinyi, ZHU Jingwei, ZHANG Liang. LLM-based Business Process Adaptation Method to Respond Long-tailed Changes [J]. Computer Science, 2026, 53(1): 29-38. |
| [10] | WANG Haoyan, LI Chongshou, LI Tianrui. Reinforcement Learning Method for Solving Flexible Job Shop Scheduling Problem Based onDouble Layer Attention Network [J]. Computer Science, 2026, 53(1): 231-240. |
| [11] | ZHAO Xiaosong, HUANG Chao, LI Jian, KANG Yulong. Energy-efficient Task Scheduling on Heterogeneous Multicore Real-time Systems with Synchronization [J]. Computer Science, 2026, 53(1): 241-251. |
| [12] | XU Jinlong, WANG Gengwu, HAN Lin, NIE Kai, LI Haoran, CHEN Mengyao, LIU Haohao. Research on Parallel Scheduling Strategy Optimization Technology Based on Sunway Compiler [J]. Computer Science, 2025, 52(9): 137-143. |
| [13] | LIU Leyuan, CHEN Gege, WU Wei, WANG Yong, ZHOU Fan. Survey of Data Classification and Grading Studies [J]. Computer Science, 2025, 52(9): 195-211. |
| [14] | CAI Qihang, XU Bin, DONG Xiaodi. Knowledge Graph Completion Model Using Semantically Enhanced Prompts and Structural Information [J]. Computer Science, 2025, 52(9): 282-293. |
| [15] | ZHONG Boyang, RUAN Tong, ZHANG Weiyan, LIU Jingping. Collaboration of Large and Small Language Models with Iterative Reflection Framework for Clinical Note Summarization [J]. Computer Science, 2025, 52(9): 294-302. |
|
||