Computer Science ›› 2026, Vol. 53 ›› Issue (6A): 250400113-8.doi: 10.11896/jsjkx.250400113

• Computer Software & Architecture • Previous Articles     Next Articles

Fuzzing Driver Generation Based on Large Language Models

WEI Qing, ZHANG Yupeng, LIU Shaoxun, ZHANG Jinfeng, ZHANG Yuezhong, CHEN Haoyang   

  1. Purple Mountain Laboratories,Nanjing 210000,China
  • Online:2026-06-16 Published:2026-06-12
  • About author:WEI Qing,born in 1987,master,intermediate engineer.His main research interests include security and fuzzing test.
    ZHANG Yupeng,born in 1992,Ph.D.His main researchinterests include safety and security of intelligent unmanned systems.
  • Supported by:
    Data Security Technology for Multi-Network Integrated System of Vehicle-Road-Cloud Coordination(2023YFB2504800) and Development and Demonstration of Endogenous Security in Multi-Agent Systems for Open Environments(ZL042501).

Abstract: With the widespread adoption of software systems,security issues have become increasingly prominent.Fuzzing,as an effective vulnerability detection technique,plays a crucial role in software development.However,traditional fuzzing tools rely on manually written driver programs,suffering from inefficiency and insufficient coverage.To address these challenges,this study proposes an automated fuzzing driver generation method based on large language models(LLMs).The approach incorporates an intelligent code parsing module to extract function interfaces and structure definitions,leverages the code generation capabilities of LLMs to automatically produce driver programs compliant with the Honggfuzz framework,and introduces a feedback-based correction mechanism to improve driver generation success rates.Experimental results demonstrate that the proposed method achieves a 100% driver generation success rate and fuzzing interface coverage in the open-source cJSON library and an in-house TBox project.For the open-source Libtiff library,the driver generation success rate reaches 76.2%,with a fuzzing interface coverage of 40.5%.Ablation studies on Qwen2.5-coder(14 B parameters) and Qwen2.5-coder(32 B parameters) indicate that the feedback correction mechanism further optimizes driver generation success rates,improving them by 5.9% and 2.4%,respectively.This method significantly enhances the automation level and coverage of fuzzing,providing an efficient solution for vulnerability detection in complex software systems.Future work may focus on optimizing the code parsing module,refining prompt templates,and enhancing LLM adaptability to further improve the method's generalizability and vulnerability discovery capabilities.

Key words: Fuzzing testing, Large language models, Driver generation, Automated testing, Feedback-based correction mechanism

CLC Number: 

  • TP311.5
[1] SCHILLER N,CHLOSTA M,SCHLOEGELM,et al.Drone Security and the Mysterious Case of DJI's DroneID[C]//Proceedings 2023 Network and Distributed System Security Symposium.2023.
[2] ZHAO X Q,QU H P,XU J L,et al.A systematic review of fuzzing[J].Soft Computing,2023,28(6):5493-5522..
[3] YAN Q,HUANG M H,CAO H Y.A Survey of Human-ma-chine Collaboration in Fuzzing[C]//2022 7th IEEE International Conference on Data Science in Cyberspace(DSC).IEEE,2022:375-382.
[4] ZALEWSKI M.American fuzzy lop[EB/OL].[2025-03-16] .https://lcamtuf.coredump.cx/afl/.
[5] KRALEWSKI K.Honggfuzz:A security oriented,feedback-driven,evolutionary,easy-to-use fuzzer [EB/OL]. [2025-03-16] .https://github.com/google/honggfuzz.
[6] SEREBRYANY K.LibFuzzer—A library for coverage-guided fuzz testing [EB/OL]. [2025-03-16] .http://llvm.org/docs/LibFuzzer.html.
[7] LYU Y L,XIE Y,CHEN P,et al.Prompt Fuzzing for Fuzz Driver Generation[C]//ACM Conference on Computer and Communications Security.2024:1-15.
[8] CHEN P,XIE Y X,LYU Y L,et al.Hopper:InterpretativeFuzzing for Libraries[C]//Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security.ACM,2023:1600-1614.
[9] ZHANG W Y,ZHANG L,MAO J L,et al.Reverse analysis and automated testing of unknown protocols[J].Journal of Computers,2020,43(4):653-667.
[10] LSPOGLOU K,AUSTIN D,MOHAN V,et al.FuzzGen:Automatic Fuzzer Generation[C]//29th Usenix Security Symposium(usenix Security 20),2020:2271-2287.
[11] XIA C S,PALTENGHI M,TIAN J L,et al.Fuzz4All:Universal Fuzzing with Large Language Models[C]//Proceedings of the IEEE/ACM 46th International Conference on Software Engineering.ACM,2023:1-13.
[12] ZHANG H X,RONG Y Y,HE Y F,et al.LLAMA FUZZ:Large Language Model Enhanced Greybox uzzing[C]//The 39th IEEE/ACM International Conference on Automated Software Engineering.2024:1-11.
[13] LIU J H,JIANG H.DeepGenFuzz:An Efficient PDF Application Fuzzing Test Case Generation Framework Based on Deep Learning[J].Computer Science,2024,51(12):53-62.
[14] PROTECT AI.Vulnhuntr:A tool to identify remotely exploitable vulnerabilities using LLMs and static code analysis.[EB/OL]. [2025-03-16] .https://github.com/protectai/vulnhuntr.
[15] PEARCE H,TAN B,AHMAD B,et al.Examining zero-shotvulnerability repair with large language models[C]//IEEE Symposium on Security and Privacy.2023:2339-2356
[16] HAZIMEH A,HERRERA A,PAYER M,et al.Magma:AGround-Truth Fuzzing Benchmark[J].Proceedings of the Acm on Measurement and Analysis of Computing Systems,2020,4:1-29.
[17] GAMBLE D.cJSON:Ultralightweight JSON parser in ANSI C[EB/OL].[2025-03-16] .https://github.com/DaveGamble/cJSON.
[18] LI Y,YANG W Z,ZHANG Y,et al.Survey on Fuzzing Based on Large Language Model[J].Ruan Jian XueBao/Journal of Software,2025,36(6):1-28.
[19] ALIBAB A.Qwen2.5-Coder:A code-specialized large language model[EB/OL].[2025-03-16] .https://ollama.com/library/qwen2.5-coder.
[20] ZHANG Z,ZHANG Y Z,ZHANG J F,et al.An Endogenous Security Study of Telematics Box in Intelligent Connected Vehicles[J].IEEE Embedded Systems Letters,2024,16(4):501-504.
[21] LEFFLER S.libtiff:TIFF Library and Utilities[EB/OL].[2025-03-16] .https://libtiff.gitlab.io/libtiff.
[22] KANG J J,PAN W C,ZHANG T,et al.Correcting Factuality Hallucination in Complaint Large Language Model via Entity-Augmented[C]//2024 International Joint Conference on Neural Networks(IJCNN).IEEE,2024:1-8.
[1] XU Rui, LIU Jin, LIU Xudong, GUAN Jian, DONG Wei. Exploring the Generalization Ability of Prompt-based Large Language Models for TextClassification [J]. Computer Science, 2026, 53(6A): 250400092-7.
[2] ZHANG Yongyu, GUO Chenjuan, FEI Xueqin, LI Feng. Study on Financial Text Sentiment Analysis Method Based on Large Language Models with Market Feedback Supervision [J]. Computer Science, 2026, 53(6A): 250500073-14.
[3] SHI Hongxu, LIU Yi, LIU Kun. Survey of Recommendation Systems Based on Large Language Models [J]. Computer Science, 2026, 53(6): 281-303.
[4] WANG Shenghui, LI Teng. Innovative Automated Scoring Based on Large Language Models [J]. Computer Science, 2026, 53(5): 90-98.
[5] LIU Suyi, LIU Qi, GAO Weibo. Agent4Stu:Efficient LLM-based Student Answer Behavior Simulation Agent [J]. Computer Science, 2026, 53(4): 347-355.
[6] HU Junjie, CHEN Yujie, HU Yikun, WEN Cheng, CAO Jialun, MA Zhi, SU Jie, SUN Weidi, TIAN Cong, QIN Shengchao. Formal Theorem Proving Empowered by Large Language Model:Survey and Perspectives [J]. Computer Science, 2026, 53(4): 1-23.
[7] XU Cheng, LIU Yuxuan, WANG Xin, ZHANG Cheng, YAO Dengfeng, YUAN Jiazheng. Review of Speech Disorder Assessment Methods Driven by Large Language Models [J]. Computer Science, 2026, 53(3): 307-320.
[8] LI Wenli, FENG Xiaonian, QIAN Tieyun. Few-shot Continuous Toxicity Detection Based on Large Language Model Augmentation [J]. Computer Science, 2026, 53(3): 321-330.
[9] LIU Lilong, LIU Guoming, QI Baoyuan, DENG Xueshan, XUE Dizhan, QIAN Shengsheng. Efficient Inference Techniques of Large Models in Real-world Applications:A Comprehensive Survey [J]. Computer Science, 2026, 53(1): 12-28.
[10] SHAO Xinyi, ZHU Jingwei, ZHANG Liang. LLM-based Business Process Adaptation Method to Respond Long-tailed Changes [J]. Computer Science, 2026, 53(1): 29-38.
[11] LI Maolin, LIN Jiajie, YANG Zhenguo. Confidence-guided Prompt Learning for Multimodal Aspect-level Sentiment Analysis [J]. Computer Science, 2025, 52(7): 241-247.
[12] CHEN Jinyin, XI Changkun, ZHENG Haibin, GAO Ming, ZHANG Tianxin. Survey of Security Research on Multimodal Large Language Models [J]. Computer Science, 2025, 52(7): 315-341.
[13] LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7.
[14] HU Caishun. Study on Named Entity Recognition Algorithms in Audit Domain Based on Large LanguageModels [J]. Computer Science, 2025, 52(6A): 240700190-4.
[15] GAO Hongkui, MA Ruixiang, BAO Qihao, XIA Shaojie, QU Chongxiao. Research on Hybrid Retrieval-augmented Dual-tower Model [J]. Computer Science, 2025, 52(6): 324-329.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!