大模型驱动的形式化定理证明:综述与展望

doi:10.11896/jsjkx.251000067

Abstract

Abstract: Theorem proving,as the intersection of logic and computer science,not only lays the formal foundation for modern mathematical reasoning,but also serves as a litmus test for evaluating the logical reasoning ability of artificial intelligence,while underpinning the fundamental demand for high reliability in software engineering.However,traditional theorem proving relies heavily on rigorous logical reasoning and labor-intensive human-computer interaction,long constrained by limited automation,insufficient reasoning efficiency,and strong dependence on expert experience.With the rapid development of large language models(LLM),their breakthrough capabilities in natural language understanding,code generation,and logical reasoning provide new opportunities to enhance the automation and intelligence of theorem proving.This paper systematically reviews the current state and emerging trends of LLM-driven formal theorem proving,focusing on two major application scenarios:1)In interactive theorem proving,we analyze how existing works alleviate the significant manual overhead,and take the Prover series in the Lean language as an example to systematically summarize its technical evolution path.2)In automated theorem proving,we explore how large language models combine techniques such as static analysis and verifier feedback to automatically generate formal specifications including function contracts and loop invariants,thereby significantly lowering the verification barrier.Finally,the paper highlights common challenges in this field,including specification completeness,reasoning reliability,data scarcity,and tool chain integration,and discusses future directions for research and development.

Key words: Theorem proving, Formal methods, Large language models, Formal verification, Formal specification

CLC Number:

TP311

HU Junjie, CHEN Yujie, HU Yikun, WEN Cheng, CAO Jialun, MA Zhi, SU Jie, SUN Weidi, TIAN Cong, QIN Shengchao. Formal Theorem Proving Empowered by Large Language Model:Survey and Perspectives[J].Computer Science, 2026, 53(4): 1-23.

References

[1]ROBINSON J A,VORONKOV A.Handbook of automated reasoning[M].Elsevier,2001.
[2]HAVELUND K,LOWRY M,PENIX J.Formal analysis of aspace-craft controller using SPIN[J].IEEE Transactions on Software Engineering,2002,27(8):749-765.
[3]SILVA L C,ALMEIDA H O,PERKUSICH A,et al.A model-based approach to support validation of medical cyber-physical systems[J].Sensors,2015,15(11):27625-27670.
[4]APPICHARLA S.The Boeing 737 Max 8 Crashes:System-based Approach to Safety－A Different Perspective[J].Safety-Critical Systems eJournal,2023,2(1):93-100.
[5]VAL O O,KOLADE T M,GBADEBO M O,et al.Strengthening cybersecurity measures for the defense of critical infrastructure in the United States[J].Asian Journal of Research in Computer Science,2024,17(11):25-45.
[6]National Highway Traffic Safety Administration.Part 573 Safety Recall Report 23V-862:Autopilot Controls Insufficient to Prevent Misuse[EB/OL].https://static.nhtsa.gov/odi/rcl/2023/RCLRPT-23V862-5632.PDF.
[7]DIJKSTRA E W.Notes on structured programming,Vol.70-WSK-03[R].Technische Hogeschool Eindhoven,1971.
[8]HOARE C A R.An axiomatic basis for computer programming[J].Communications of the ACM,1969,12(10):576-580.
[9]KLEIN G,ELPHINSTONE K,HEISER G,et al.seL4:Formal verification of an OS kernel[C]//Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles.2009:207-220.
[10]JIANG A Q,WELLECK S,ZHOU J P,et al.Draft,sketch,and prove:Guiding formal theorem provers with informal proofs[J].arXiv:2210.12283,2022.
[11]CHEN L,GU J,HUANG L,et al.Seed-prover:Deep and broad reasoning for automated theorem proving[J].arXiv:2507.23726,2025.
[12]WANG M,TANG Y,WANG J,et al.Premise selection for theorem proving by deep graph embedding[C]//Advances in Neural Information Processing Systems.2017.
[13]POLU S,SUTSKEVER I.Generative language modeling for automated theorem proving[J].arXiv:2009.03393,2020.
[14]LAMPLE G,LACROIX T,LACHAUX M A,et al.Hypertree proof search for neural theorem proving[J].Advances in Neural Information Processing Systems,2022,35:26337-26349.
[15]BROWN T,MANN B,RYDER N,et al.Language models are few-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901.
[16]LEWIS P,PEREZ E,PIKTUS A,et al.Retrieval-augmentedgeneration for knowledge-intensive nlp tasks[J].Advances in Neural Information Processing Systems,2020,33:9459-9474.
[17]RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring the limits of transfer learning with a unified text-to-text transformer[J].Journal of Machine Learning Research,2020,21(140):1-67.
[18]OUYANG L,WU J,JIANG X,et al.Training language models to follow instructions with human feedback[J].Advances in Neural Information Processing Systems,2022,35:27730-27744.
[19]WOHLIN C.Guidelines for snowballing in systematic literature studies and a replication in software engineering[C]//Procee-dings of the 18th International Conference on Evaluation and Assessment in Software Engineering.2014:1-10.
[20]HUTH M,RYAN M.Logic in Computer Science:Modelling and reasoning about systems[M].Cambridge University Press,2004.
[21]WOODCOCK J,LARSEN P G,BICARREGUI J,et al.Formal methods:Practice and experience[J].ACM Computing Surveys,2009,41(4):1-36.
[22]SIMPSON M S,BARUA R K.MemSafe:ensuring the spatial and temporal memory safety of C at runtime[J].Software:Practice and Experience,2013,43(1):93-128.
[23]LEINO K R M.Dafny:An automatic program verifier for functional correctness[C]//International Conference on Logic for Programming Artificial Intelligence and Reasoning.Berlin:Springer,2010:348-370.
[24]FILLIÂTRE J C,PASKEVICH A.Why3－where programsmeet provers[C]//European Symposium on Programming.Berlin:Springer,2013:125-128.
[25]HUET G,KAHN G,PAULIN-MOHRING C.The coq proofassistant a tutorial[J].Rapport Technique,1997,178:113.
[26]NIPKOW T,WENZEL l M,PAULSON L C.Isabelle/HOL:a proof assistant for higher-order logic[M].Berlin:Springer,2002.
[27]DE MOURA L,KONG S,AVIGAD J,et al.The Lean theorem prover(system description)[C]//International Conference on Automated Deduction.Cham:Springer,2015:378-388.
[28]TURING A M.On computable numbers,with an application to the Entscheidungs problcm[C]//Proceedings of the London Mathematical Society.1937:230-265.
[29]HALES T,ADAMS M,BAUER G,et al.A formal proof of the Kepler conjecture[J].arXiv:1501.02155,2015.
[30]LIN X,CAO Q,HUANG Y,et al.FVEL:Interactive formalverification environment with large language models via theorem proving[J].Advances in Neural Information Processing Systems,2024,37:54932-54946.
[31]DONG K,MAHANKALI A,MA T.Formal theorem proving by rewarding llms to decompose proofs hierarchically[J].arXiv:2411.01829,2024.
[32]THOMPSON K,SAAVEDRA N,CARROTT P,et al.Rango:Adaptive retrieval-augmented proving for automated software verification[J].arXiv:2412.14063,2024.
[33]KOZYREV A,SOLOVEV G,KHRAMOV N,et al.CoqPilot,a plugin for LLM-based generation of proofs[C]//Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering.2024:2382-2385.
[34]XIN H,GUO D,SHAO Z,et al.Deepseek-prover:Advancingtheorem proving in llms through large-scale synthetic data[J].arXiv:2405.14333,2024.
[35]XIN H,REN Z Z,SONG J,et al.Deepseek-prover-v1.5:Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search[J].arXiv:2408.08152,2024.
[36]REN Z Z,SHAO Z,SONG J,et al.Deepseek-prover-v2:Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition[J].arXiv:2504.21801,2025.
[37]WANG H,UNSAL M,LIN X,et al.Kimina-prover preview:Towards large formal reasoning models with reinforcement learning[J].arXiv:2504.11354,2025.
[38]XIN R,XI C,YANG J,et al.Bfs-prover:Scalable best-first tree search for llm-based automatic theorem proving[J].arXiv:2502.03438,2025.
[39]YANG K,SWOPE A,GU A,et al.Leandojo:Theorem proving with retrieval-augmented language models[J].Advances in Neural Information Processing Systems,2023,36:21573-21612.
[40]YING H,ZHANG S,LI L,et al.Internlm-math:Open mathlarge language models toward verifiable reasoning[J].arXiv:2402.06332,2024.
[41]WANG R,ZHANG J,JIA Y,et al.Theoremllama:Transfor-ming general-purpose llms into lean4 experts[J].arXiv:2407.03203,2024.
[42]LIN H,SUN Z,WELLECK S,et al.Lean-star:Learning to interleave thinking and proving[J].arXiv:2407.10040,2024.
[43]WU Z,WANG J,LIN D,et al.Lean-github:Compiling github lean repositories for a versatile lean prover[J].arXiv:2407.17227,2024.
[44]GLOECKLE F,LIMPERG J,SYNNAEVE G,et al.ABEL:Sample efficient online reinforcement learning for neural theorem proving[C]//The 4th Workshop on Mathematical Reaso-ning and AI at NeurIPS’24.2024.
[45]WU Z,HUANG S,ZHOU Z,et al.Internlm2.5-stepprover:Advancing automated theorem proving via expert iteration on large-scale lean problems[J].arXiv:2410.15700,2024.
[46]WU S,LU S,GONG Y,et al.Alchemy:Amplifying theorem-proving capability through symbolic mutation[J].arXiv:2410.15748,2024.
[47]LAMONT S,WALDER C,DEZFOULI A,et al.3D-Prover:Diversity Driven Theorem Proving With Determinantal Point Processes[J].arXiv:2410.11133,2024.
[48]LI Y,DU D,SONG L,et al.Hunyuanprover:A scalable datasynthesis framework and guided tree search for automated theorem proving[J].arXiv:2412.20735,2024.
[49]DONG K,MA T.Stp:Self-play llm theorem provers with iterative conjecturing and proving[J].arXiv:2502.00212,2025.
[50]LIN Y,TANG S,LYU B,et al.Goedel-prover:A frontier model for open-source automated theorem proving[J].arXiv:2502.07640,2025.
[51]ZHANG J,WANG Q,JI X,et al.Leanabell-prover:Posttraining scaling in formal reasoning[J].arXiv:2504.06122,2025.
[52]SHAN H,XIONG B,QI N,et al.FrameProver:Leveraging Formal Frameworks in Building Proofs for Enhanced Inference[C]//International Conference on Intelligent Computing.Singapore:Springer,2025:188-200.
[53]JI X,LIU Y,WANG Q,et al.Leanabell-prover-v2:Verifier-integrated reasoning for formal theorem proving via reinforcement learning[J].arXiv:2507.08649,2025.
[54]LIN Y,TANG S,LYU B,et al.Goedel-prover-v2:Scaling formal theorem proving with scaffolded data synthesis and self-correction[J].arXiv:2508.03613,2025.
[55]XIN R,ZHENG Z,NIE Y,et al.Scaling up Multi-Turn Off-Po-licy RL and Multi-Agent Tree Search for LLM Step-Provers[J].arXiv:2509.06493,2025.
[56]LU M,DELAWARE B,ZHANG T.Proof automation withlarge language models[C]//Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering.2024:1509-1520.
[57]QIN J X,DU A,ZHANG D F,et al.Can Large Language Mo-dels Verify System Software? A Case Study Using FSCQ as a Benchmark[C]//Workshop on Hot Topics in Operating Systems(HOTOS’25).New York:ACM,2025.
[58]LIU X P,LIU H Z,YI X D,et al.LLM-Enhanced Theorem Proving with Term Explanation and Tactic Parameter Repair[C]//15th Asia-Pacific Symposium on Internetware(Internetware’24).New York:ACM,2024.
[59]THAKUR A,LEE J,TSOUKALAS G,et al.CLEVER:A Curated Benchmark for Formally Verified Code Generation[J].arXiv:2505.13938,2025.
[60]THAKUR A,WEN Y,CHAUDHURI S.A language-agent approach to formal theorem-proving[EB/OL].https://openreview.net/pdf?id=XCMbagV0No.
[61]LIU H,SUN J,LI Z,et al.ProofAug:Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis[C]//Forty-second International Conference on Machine Learning.2025.
[62]KUMARAPPAN A,TIWARI M,SONG P,et al.LeanAgent:Lifelong Learning for Formal Theorem Proving[J].arXiv:2410.06209,2024.
[63]CAI Y F,HOU Z,SANAN D,et al.Automated Program Refinement:Guide and Verify Code Large Language Model with Refinement Calculus[C]//Proceedings of the ACM on Programming Languages.2025:2057-2089.
[64]TAHAT A,HARDIN D,PETZ A,et al.Proof repair utilizing large language models:a case study on the Copland remote attestation proofbase[C]//International Conference on Bridging the Gap between AI and Reality.Cham:Springer,2024:145-166.
[65]ANIVA L,SUN C,MIRANDA B,et al.Pantograph:A machine-to-machine interaction interface for advanced theorem proving,high level reasoning,and data extraction in lean 4[C]//International Conference on Tools and Algorithms for the Construction and Analysis of Systems.Cham:Springer,2025:104-123.
[66]HAN S,HUR T,HUR Y,et al.Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques[C]//Intelligent Computing-Proceedings of the Computing Conference.Cham:Springer,2025:205-222.
[67]DOAN L,NGUYEN T V.AI-Assisted Autoformalization ofCombinatorics Problems in Proof Assistants[C]//2025 IEEE/ACM 47th International Conference on Software Engineering:New Ideas and Emerging Results(ICSE-NIER).IEEE,2025:1-5.
[68]KRIPNER M,ŠUSTR M,STRAKA M.LeanTree:Accelerating White-Box Proof Search with Factorized States in Lean 4[J].arXiv:2507.14722,2025.
[69]BABA K,LIU C,KURITA S,et al.Prover Agent:An Agent-based Framework for Formal Mathematical Proofs[J].arXiv:2506.19923,2025.
[70]CAO C,LI M,DAI J,et al.Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving[J].arXiv:2506.17104,2025.
[71]ASHER J.LeanExplore:A search engine for Lean 4 declarations[J].arXiv:2506.11085,2025.
[72]PATEL M,BHATTACHARYYA R,LU T,et al.LeanTutor:A Formally-Verified AI Tutor for Mathematical Proofs[J].arXiv:2506.08321,2025.
[73]KIRTANIA S.Activation Steering in Neural Theorem Provers[J].arXiv:2502.15507,2025.
[74]YANG T,YAN M,ZHAO H,et al.LemmaHead:RAG Assisted Proof Generation Using Large Language Models[J].arXiv:2501.15797,2025.
[75]BUALI M,HOEHNDORF R.Towards Automated FunctionalEquation Proving:A Benchmark Dataset and A Domain-Specific In-Context Agent[J].arXiv:2407.14521,2024.
[76]SONG P,YANG K,ANANDKUMAR A.Lean copilot:Large language models as copilots for theorem proving in lean[J].ar-Xiv:2404.12534,2024.
[77]KASIBATLA S R,AGARWAL A,BRUN Y,et al.Cobble-stone:Iterative Automation for Formal Verification[J].arXiv:2410.19940,2024.
[78]ZHOU J P,STAATS C,LI W,et al.Don’t Trust:Verify－Grounding LLM Quantitative Reasoning with Autoformalization[J].arXiv:2403.18120,2024.
[79]GADGIL S,TADIPATRI A R,AGRAWAL A,et al.Towards automating formalisation of theorem statements using large language models[C]//36th Conference on Neural Information Processing Systems(NeurIPS 2022) Workshop on MATH-AI.2022.
[80]MIRANDA B,ZHOU Z,NIE A,et al.VeriBench:End-to-EndFormal Verification Benchmark for AI Code Generation in Lean 4[C]//2nd AI for Math Workshop@ ICML 2025.2025.
[81]TSOUKALAS G,LEE J,JENNINGS J,et al.Putnambench:A multilingual competition-mathematics benchmark for formal theorem-proving[C]//AI for Math Workshop@ ICML 2024.2024.
[82]CAO C,SONG L,LI Z,et al.Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models[J].arXiv:2506.11487,2025.
[83]ZHOU Y,ZHAO J,ZHANG Y,et al.Solving formal math problems by decomposition and iterative reflection[J].arXiv:2507.15225,2025.
[84]CHEN B,BABIKIAN A A,FENG S,et al.LLM-based Satisfiability Checking of String Requirements by Consistent Data and Checker Generation[J].arXiv:2506.16639,2025.
[85]PAN L,ALBALAK A,WANG X,et al.Logic-LM:Empowering Large Language Models with Symbolic Solvers for Faithful Lo-gical Reasoning[J].arXiv:2305.12295,2023.
[86]YE X,CHEN Q,DILLIG I,et al.Satlm:Satisfiability-aided language models using declarative prompting[J].Advances in Neural Information Processing Systems,2023,36:45548-45580.
[87]KARIA R,DOBHAL D,BRAMBLETT D,et al.Can LLMs translate SATisfactorily? Assessing LLMs in Generating and Interpreting Formal Specifications[J/OL].https://pulkitverma.net/assets/pdf/kdbvs_aia24/kdbvs_aia24.pdf.
[88]SZEIDER S.MCP-Solver:Integrating Language Models withConstraint Programming Systems[J].arXiv:2501.00539,2025.
[89]HAZRA R,VENTURATO G,MARTIRES P Z D,et al.Canlarge language models reason? A characterization via 3-SAT[J].arXiv:2408.07215,2024.
[90]MARINO R.Fast Analysis of the OpenAI O1-Preview Model in Solving Random K-SAT Problem:Does the LLM Solve the Problem Itself or Call an External SAT Solver?[J].arXiv:2409.11232,2024.
[91]ALFEZEA Y,AL-YOUSEFI A,AL-JARRAH M,et al.Analyzing the Impact of Fine-Tuning a Large Language Model on the 3-SAT Problem[C]//2025 1st International Conference on Computational Intelligence Approaches and Applications(ICCIAA).2025:1-8.
[92]SCHIDLER A,SZEIDER S.Extracting problem structure with LLMs for optimized SAT local search[J].arXiv:2501.14630,2025.
[93]CLARKE E,BIERE A,RAIMI R,et al.Bounded model chec-king using satisfiability solving[J].Formal methods in system design,2001,19(1):7-34.
[94]JANßEN C,RICHTER C,WEHRHEIM H.Can ChatGPT support software verification?[C]//International Conference on Fundamental Approaches to Software Engineering.Cham:Springer,2024:266-279.
[95]KAMATH A,MOHAMMED N,SENTHILNATHAN A,et al.Leveraging LLMs for Program Verification[C]//2024 Formal Methods in Computer-Aided Design(FMCAD).2024:107-118.
[96]HEIZMANN M,HOENICKE J,PODELSKI A.Refinement of trace abstraction[C]//International Static Analysis Symposium.Berlin:Springer,2009:69-85.
[97]WEN C,CAO J,SU J,et al.Enchanting program specification synthesis by large language models using static analysis and program verification[C]//International Conference on Computer Aided Verification.Cham:Springer,2024:302-328.
[98]LIU C,WU X,FENG Y,et al.Towards general loop invariant generation:A benchmark of programs with memory manipulation[J].Advances in Neural Information Processing Systems,2024,37:129120-129145.
[99]MA L,LIU S,LI Y,et al.Specgen:Automated generation offormal program specifications via large language models[J].arXiv:2401.08807,2024.
[100]LIU R,LI G,CHEN M,et al.Enhancing Automated Loop Invariant Generation for Complex Programs with Large Language Models[J].arXiv:2412.10483,2024.
[101]ZHONG S,ZHU J,TIAN Y,et al.RAG-Verus:Repository-Level Program Verification with LLMs using Retrieval Augmented Generation[J].arXiv:2502.05344,2025.
[102]YANG C,LI X,MISU M R H,et al.AutoVerus:Automatedproof generation for Rust code[J].arXiv:2409.13082,2024.
[103]CHAKRABORTY M,PIRKELBAUER P,YI Q.FormalSpec-Cpp:A Dataset of C++ Formal Specifications created using LLMs[C]//2025 IEEE/ACM 22nd International Conference on Mining Software Repositories(MSR).IEEE,2025:758-762.
[104]CAO J,LU Y,LI M,et al.From Informal to Formal--Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs[J].arXiv:2501.16207,2025.
[105]AGGARWAL P,PARNO B,WELLECK S.AlphaVerus:Bootstrapping formally verified code generation through self-improving translation and treefinement[J].arXiv:2412.06176,2024.
[106]LI Y,PARSERT J,POLGREEN E.Guiding enumerative program synthesis with large language models[C]//International Conference on Computer Aided Verification.Cham:Springer,2024:280-301.
[107]MISU M R H,LOPES C V,MA I,et al.Towards ai-assisted synthesis of verified dafny methods[C]//Proceedings of the ACM on Software Engineering.2024:812-835.
[108]LOUGHRIDGE C,SUN Q,AHRENBACH S,et al.Dafny-Bench:A benchmark for formal software verification[J].arXiv:2406.08467,2024.
[109]SUN C,AGASHE V,CHAKRABORTY S,et al.ClassInvGen:Class Invariant Synthesis using Large Language Models[J].arXiv:2502.18917,2025.
[110]MUGNIER E,GONZALEZ E A,POLIKARPOVA N,et al.Laurel:Unblocking Automated Verification with Large Language Models[C]//Proceedings of the ACM on Programming Languages.2025:1519-1545.
[111]JHA M,WAN J X,ZHANG H,et al.PREFACE - A Reinforcement Learning Framework for Code Verification via LLM Prompt Repair[C]//Great Lakes Symposium on VLSI 2025(GLSVLSI’25).New York:ACM,2025.
[112]CAO W,WU G,XU T,et al.Clause2Inv:A Generate-Combine-Check Framework for Loop Invariant Inference[C]//Procee-dings of the ACM on Software Engineering.2025:1009-1030.
[113]TANEJA J,LAIRD A,YAN C,et al.Llm-vectorizer:Llm-based verified loop vectorizer[C]//Proceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization.2025:137-149.
[114]RUBIO-MEDRANO C E,KOTAK A,WANG W,et al.Pairing human and artificial intelligence:Enforcing access control policies with llms and formal specifications[C]//Proceedings of the 29th ACM Symposium on Access Control Models and Technologies.2024:105-116.
[115]SILVA Á F,MENDES A,FERREIRA J F.Leveraging largelanguage models to boost Dafny’s developers productivity[C]//Proceedings of the 2024 IEEE/ACM 12th International Confe-rence on Formal Methods in Software Engineering(FormaliSE).2024:138-142.
[116]LI Y C,ZETZSCHE S,SOMAYYAJULA S.Dafny as Verification-Aware Intermediate Language for Code Generation[C]//Proceedings of Dafny Workshop 2025.Denver:Dafny Workshop Organizing Committee,2025:1-8.
[117]YAN C,CHE F,HUANG X,et al.Re:Form--Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs:A Preliminary Study on Dafny[J].arXiv:2507.16331,2025.
[118]WU V,MENDES A,ABREU A.Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs[J].arXiv:2507.03659,2025.
[119]DENG X,ZHONG S,VENERIS A,et al.VerifyThisBench:Generating Code,Specifications,and Proofs All at Once[J].arXiv:2505.19271,2025.
[120]LI S,JIANG J,ZHAO T,et al.OSVBench:Benchmarking LLMs on Specification Generation Tasks for Operating System Verification[J].arXiv:2504.20964,2025.
[121]POESIA G,LOUGHRIDGE C,AMIN N.dafny-annotator:Ai-assisted verification of dafny programs[J].arXiv:2411.15143,2024.
[122]REGO M,FAN W,HU X,et al.Evaluating the Ability of GPT-4o to Generate Verifiable Specifications in VeriFast[C]//2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering(Forge).IEEE,2025:246-251.
[123]MUKHERJEE P,DELAWARE B.Towards automated verification of llm-synthesized c programs[J].arXiv:2410.14835,2024.
[124]SUN C,SHENG Y,PADON O,et al.Clover:Clo sed-loop ver ifiable code generation[C]//International Symposium on AI Verification.Cham:Springer,2024:134-155.
[125]YAO J,ZHOU Z,CHEN W,et al.Leveraging large language models for automated proof synthesis in rust[J].arXiv:2311.03739,2023.
[126]KING D,KOUTAVAS V,KOVÁCS L.Llm-Based Generation of Weakest Preconditions and Precise Array Invariants[C]//2025 IEEE/ACM 13th International Conference on Formal Methods in Software Engineering(FormaliSE).IEEE,2025:1-5.
[127]SEVENHUIJSEN M,ETEMADI K,NYBERG M.VeCoGen:Automating Generation of Formally Verified C Code with Large Language Models[C]//2025 IEEE/ACM 13th International Conference on Formal Methods in Software Engineering(FormaliSE).IEEE,2025:101-112.
[128]LAHIRI S K.Evaluating llm-driven user-intent formalization for verification-aware languages[C]//Conference on Formal Methods in Computer-Aided Design－FMCAD 2024.2024:142.
[129]BECKERT B,KLAMROTH J,PFEIFER W,et al.Towardscombining the cognitive abilities of large language models with the rigor of deductive progam verification[C]//International Symposium on Leveraging Applications of Formal Methods.Cham:Springer,2024:242-257.
[130]PASCOAL F J,TRIGO E,ABREU R.Automatic Generation of Loop Invariants in Dafny with Large Language Models[C]//International Conference on Fundamentals of Software Enginee-ring.Cham:Springer,2025:138-154.
[131]HUANG D,DHARIWAL P,SONG D,et al.Gamepad:A lear-ning environment for theorem proving[J].arXiv:1806.00608,2018.
[132]YANG K,DENG J.Learning to prove theorems via interacting with proof assistants[C]//International Conference on Machine Learning.PMLR,2019:6984-6994.
[133]SANCHEZ-STERN A,ALHESSI Y,SAUL L,et al.Generating correctness proofs with neural networks[C]//Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages.2020:1-10.
[134]CHEN M,TWOREK J,JUN H,et al.Evaluating large language models trained on code[J].arXiv:2107.03374,2021.
[135]REICHEL T,HENDERSON R,TOUCHET A,et al.Proof repair infrastructure for supervised models:Building a large proof repair dataset[C]//14th International Conference on Interactive Theorem Proving(ITP 2023).Schloss Dagstuhl－Leibniz-Zentrum für Informatik,2023:1-26:20.
[136]RUTE J,OLŠÁK M,BLAAUWBROEK L,et al.Graph2Tac:Learning hierarchical representations of math concepts in theorem proving[EB/OL].https://openreview.net/pdf?id=XyB4VvF01X.
[137]FLORATH A.Enhancing formal theorem proving:a compre-hensive dataset for training AI models on Coq code[J].arXiv:2403.12627,2024.
[138]PROOFBASE R A,TAHAT A,HARDIN D,et al.Proof Repair Utilizing Large Language Models:A Case Study on the Copland[C]//Bridging the Gap Between AI and Reality:Second International Conference.Springer,2024:145.
[139]LI W,YU L,WU Y,et al.Isarstep:a benchmark for high-level mathematical reasoning[J].arXiv:2006.09265,2020.
[140]FIRST E,RABE M N,RINGER T,et al.Baldur:Whole-proof generation and repair with large language models[C]//Procee-dings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2023:1229-1241.
[141]JIANG A Q,LI W,HAN J M,et al.Lisa:Language models of isabelle proofs[C]//6th Conference on Artificial Intelligence and Theorem Proving.2021:378-392.
[142]ZHENG K,HAN J M,POLU S.Minif2f:a cross-system benchmark for formal olympiad-level mathematics[J].arXiv:2109.00110,2021.
[143]MIKUŁA M,TWORKOWSKI S,ANTONIAK S,et al.Mag-nushammer:A transformer-based approach to premise selection[J].arXiv:2303.04488,2023.
[144]ZHANG L,LU S,DUAN N.Selene:Pioneering automated proof in software verification[J].arXiv:2401.07663,2024.
[145]HAN J M,RUTE J,WU Y,et al.Proof artifact co-training for theorem proving with language models[J].arXiv:2102.06203,2021.
[146]XIONG J,SHEN J,YUAN Y,et al.Trigo:Benchmarking formal mathematical proof reduction for generative language mo-dels[J].arXiv:2310.10180,2023.
[147]AZERBAYEV Z,PIOTROWSKI B,SCHOELKOPF H,et al.Proofnet:Autoformalizing and formally proving undergraduate-level mathematics[J].arXiv:2302.12433,2023.
[148]LIU C,SHEN J,XIN H,et al.Fimo:A challenge formal dataset for automated theorem proving[J].arXiv:2309.04295,2023.
[149]NESTEROV V,KAPUSHEV Y,BURTSEV M.Lean4trace:Data augmentation for neural theorem proving in Lean[C]//AI for Math Workshop@ ICML 2024.2024.
[150]YING H,WU Z,GENG Y,et al.Lean workbook:A large-scale lean problem set formalized from natural language math problems[J].Advances in Neural Information Processing Systems,2024,37:105848-105863.
[151]LU J,WAN Y,LIU Z,et al.Process-driven autoformalization in lean 4[J].arXiv:2406.01940,2024.
[152]HU J,ZHU T,WELLECK S.miniCTX:Neural theorem proving with(long-) contexts[J].arXiv:2408.03350,2024.
[153]ONDA N,KASAURA K,ORIIKE Y,et al.LeanConjecturer:Automatic Generation of Mathematical Conjectures for Theorem Proving[J].arXiv:2506.22005,2025.
[154]LIU J,LIN X,BAYER J,et al.CombiBench:BenchmarkingLLM capability for combinatorial mathematics[J].arXiv:2505.03171,2025.
[155]ZHAO H,GENG Y,TANG S,et al.Ineq-Comp:Benchmarking human-intuitive compositional reasoning in automated theorem proving on inequalities[J].arXiv:2505.12680,2025.
[156]SUN J,TANG Y,LI A,et al.Enumerate-Conjecture-Prove:Formally Solving Answer-Construction Problems in Math Competitions[J].arXiv:2505.18492,2025.
[157]XIN H,LI L,JIN X,et al.APE-Bench I:Towards File-level Automated Proof Engineering of Formal Math Libraries[J].arXiv:2504.19110,2025.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Formal Theorem Proving Empowered by Large Language Model:Survey and Perspectives

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0

[1]	CHEN Hongxiu, ZENG Xia, LIU Zhiming, ZHAO Hengjun. Automatic Theorem Proving Based on Pre-trained Language Models and Unification [J]. Computer Science, 2026, 53(4): 40-47.
[2]	LIU Suyi, LIU Qi, GAO Weibo. Agent4Stu:Efficient LLM-based Student Answer Behavior Simulation Agent [J]. Computer Science, 2026, 53(4): 347-355.
[3]	XU Cheng, LIU Yuxuan, WANG Xin, ZHANG Cheng, YAO Dengfeng, YUAN Jiazheng. Review of Speech Disorder Assessment Methods Driven by Large Language Models [J]. Computer Science, 2026, 53(3): 307-320.
[4]	LI Wenli, FENG Xiaonian, QIAN Tieyun. Few-shot Continuous Toxicity Detection Based on Large Language Model Augmentation [J]. Computer Science, 2026, 53(3): 321-330.
[5]	LIU Lilong, LIU Guoming, QI Baoyuan, DENG Xueshan, XUE Dizhan, QIAN Shengsheng. Efficient Inference Techniques of Large Models in Real-world Applications:A Comprehensive Survey [J]. Computer Science, 2026, 53(1): 12-28.
[6]	SHAO Xinyi, ZHU Jingwei, ZHANG Liang. LLM-based Business Process Adaptation Method to Respond Long-tailed Changes [J]. Computer Science, 2026, 53(1): 29-38.
[7]	LI Maolin, LIN Jiajie, YANG Zhenguo. Confidence-guided Prompt Learning for Multimodal Aspect-level Sentiment Analysis [J]. Computer Science, 2025, 52(7): 241-247.
[8]	CHEN Jinyin, XI Changkun, ZHENG Haibin, GAO Ming, ZHANG Tianxin. Survey of Security Research on Multimodal Large Language Models [J]. Computer Science, 2025, 52(7): 315-341.
[9]	LI Jiangxu, CHEN Zemao, ZHANG Liqiang. Lightweight Authentication and Key Agreement Protocol for Cloud-assisted Smart Home Communication [J]. Computer Science, 2025, 52(7): 342-352.
[10]	LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7.
[11]	HU Caishun. Study on Named Entity Recognition Algorithms in Audit Domain Based on Large LanguageModels [J]. Computer Science, 2025, 52(6A): 240700190-4.
[12]	GAO Hongkui, MA Ruixiang, BAO Qihao, XIA Shaojie, QU Chongxiao. Research on Hybrid Retrieval-augmented Dual-tower Model [J]. Computer Science, 2025, 52(6): 324-329.
[13]	WANG Zhiyi, HU Jun, XU Heng. Transition and Verification Method from RSML^-e to Lustre Model for Flight Mode Transition [J]. Computer Science, 2025, 52(12): 48-59.
[14]	LI Hao, YANG Yumeng, ZHAO Boyang, ZHENG Puqi, LIN Hongfei. Adverse Drug Reaction Relationship Extraction Based on Chain of Thought Enhancement UnderHigh and Low Resources [J]. Computer Science, 2025, 52(12): 224-230.
[15]	HUANG Haixin, XU Chenglong, FU Yao. Research on Structured Pruning Algorithm Based on Information Fusion [J]. Computer Science, 2025, 52(11A): 241000041-6.