计算机科学 ›› 2025, Vol. 52 ›› Issue (1): 80-86.doi: 10.11896/jsjkx.240900075

• 大语言模型技术研究及应用 • 上一篇    下一篇

基于预训练大模型的行动方案生成方法

颜玉松1, 周圆2, 王琮2, 孔圣麒1, 王权2, 黎敏讷2, 王之元2   

  1. 1 国防科技大学计算机学院 长沙 410005
    2 智能博弈与决策实验室 北京 100000
  • 收稿日期:2024-09-12 修回日期:2024-10-14 出版日期:2025-01-15 发布日期:2025-01-09
  • 通讯作者: 周圆(yuaanzhou@outlook.com)
  • 作者简介:(yanyusong23@nudt.edu.cn)
  • 基金资助:
    国家自然科学基金青年科学基金(62102442);国家自然科学基金(62402500)

COA Generation Based on Pre-trained Large Language Models

YAN Yusong1, ZHOU Yuan2, WANG Cong2, KONG Shengqi1, WANG Quan2, LI Minne2, WANG Zhiyuan2   

  1. 1 College of Computer,National University of Defense Technology,Changsha 410005,China
    2 Intelligent Game and Decision Lab,Beijing 100000,China
  • Received:2024-09-12 Revised:2024-10-14 Online:2025-01-15 Published:2025-01-09
  • About author:YAN Yusong,born in 2001,Ph.D candidate.His main research interests include reinforcement and intelligent decision and so on.
    ZHOU Yuan,born in 1993,Ph.D,assistant researcher.Her main research interests include machine learning and intelligent decision.
  • Supported by:
    Young Scientists Fund of the National Natural Science Foundation of China(62102442) and National Natural Science Foundation of China(62402500).

摘要: 围绕生成式人工智能赋能指挥决策需求,分析了指挥决策中方案生成问题的难点挑战和新兴预训练大语言模型技术的应用前景,提出了一种基于预训练大模型的作战行动方案生成方法——COA-Gen。首先,为了使生成的行动方案符合目标,设计了多轮方案生成框架;其次,构建了多要素中文提示词模板用于整合海量多源信息;最后,针对特定小领域的数据缺乏问题,引入知识增强技术以提升大模型规划效能。为了验证所提行动方案的效果,制定了基于《星际争霸II》游戏引擎和“虎爪”想定的方案验证环境。实验结果表明,该方法具有较好的鲁棒性,可以较好地依从指挥员意图,验证了大模型用于作战行动方案生成的可行性。此外,不同预训练大模型在相同任务中展现出不同的效果,表明在实际应用中选择不同的预训练大模型可能会生成具有不同风格的行动方案,从而影响最终的行动结果。

关键词: 大模型, 生成式人工智能, 智能决策, 指挥与控制, 作战行动方案

Abstract: Focusing on empowering the command and control(C2) procedure of generative AI,we analyze the challenges of course of action(COA) generation in C2 and the prospects of pre-trained large language models(LLMs).Then,a COA generation me-thod based on pre-trained LLMs,COA-Gen,is proposed.Firstly,a multi-round generation framework is designed to align the generated plans with objectives.Secondly,a multi-factor prompt templates is constructed to integrate vast amounts of multi-source information.Lastly,knowledge-augmented generation technology is introduced to improve the generation quality of the few-shot military domain.To validate the effectiveness of the generated plans,an emulation environment based on the StarCraft II engine and the “Tiger Claw” scenario is established.The results show the robustness of the method and its alignment with the commander’s intention.The feasibility of using LLMs for COA generation has been verified.Additionally,different pre-trained models exhibit varying performances in the same task,indicating that the choice of model in real-world applications can lead to action plans with different styles,thereby affect the ultimate outcomes.

Key words: Large language model, Generative AI, Intelligent decision-making, Command and control, Course of action

中图分类号: 

  • TP399
[1]ZHANG Y X.Research on Modeling and Optimization Methods for Military Mission Planning under Uncertainty[D].Changsha:National University of Defense Technology,2014.
[2]WAYTOWICH N,HARE J,GOECKS V G,et al.Learning to guide multiple heterogeneous actors from a single human de-monstration via automatic curriculum learning in StarCraft II[C]//Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications IV.SPIE,2022,12113:283-293.
[3]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[J].Communications of the ACM,2020,63(11):139-144.
[4]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013.
[5]BROWN T,MANN B,RYDER N,et al.Language models are few-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901.
[6]BAI J,BAI S,CHU Y,et al.Qwen technical report[J].arXiv:2309.16609,2023.
[7]TOUVRON H,LAVRIL T,IZACARD G,et al.Llama:Openand efficient foundation language models[J].arXiv:2302.13971,2023.
[8]ZENG A,LIU X,DU Z,et al.Glm-130b:An open bilingual pre-trained model[J].arXiv:2210.02414,2022.
[9]WEI J,WANG X,SCHUURMANS D,et al.Chain-of-thoughtprompting elicits reasoning in large language models[J].Advances in Neural Information Processing Systems,2022,35:24824-24837.
[10]SHINN N,CASSANO F,GOPINATH A,et al.Reflexion:Language agents with verbal reinforcement learning[J].Advances in Neural Information Processing Systems,2024,36:8634-8652.
[11]HUANG Y,HUANG J.A Survey on Retrieval-AugmentedText Generation for Large Language Models[J].arXiv:2404.10981,2024.
[12]FIKES R E,NILSSON N J.STRIPS:A new approach to the application of theorem proving to problem solving[J].Artificial Intelligence,1971,2:189-208.
[13]TATE A,DRABBLE B,DALTON J.The use of condition types to restrict search in an AI planner[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,1994:1129-1134.
[14]SARCIA S A.Organizing Structures and Information for Deve-loping AI-enabled Military Decision-Making Systems[C]//2023 IEEE International Workshop on Technologies for Defense and Security(TechDefense).IEEE,2023:455-460.
[15]SCHWARTZ P J,O’NEILL D V,BENTZ M E,et al.AI-enabled wargaming in the military decision making process[C]//Artificial Intelligence And Machine Learning for Multi-Domain Operations Applications II.SPIE,2020,11413:118-134.
[16]LUO J Z,SUN Y L,QIAN Z Z,et al.Overview and Prospect of Artificial Intelligence Large Models[J].Radio Engineering,2023,53(11):2461-2472.
[17]BAI J,BAI S,YANG S,et al.Qwen-vl:A versatile vision-lan-guage model for understanding,localization,text reading,and beyond[J].arXiv:2308.12966,2023.
[18]WANG G,XIE Y,JIANG Y,et al.Voyager:An open-ended embodied agent with large language models[J].arXiv:2305.16291,2023.
[19]AHN M,BROHAN A,BROWN N,et al.Do as I can,not as I say:Grounding language in robotic affordances[J].arXiv:2204.01691,2022.
[20]LAMPARTH M,CORSO A,GANZ J,et al.Human vs.ma-chine:Language models and wargames[J].arXiv:2403.03407,2024.
[21]GOECKS V G,WAYTOWICH N.Coa-gpt:Generative pre-trained transformers for accelerated course of action development in military operations[C]//2024 International Conference on Military Communication and Information Systems.IEEE,2024:1-10.
[22]HU S,HUANG T,LIU L.Pok\′eLLMon:A Human-Parity Agent for Pok\′emon Battles with Large Language Models[J].arXiv:2402.01118,2024.
[23]MNIH V.Asynchronous Methods for Deep ReinforcementLearning[J].arXiv:1602.01783,2016.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!