Computer Science ›› 2026, Vol. 53 ›› Issue (4): 40-47.doi: 10.11896/jsjkx.251000066

• Interdisciplinary Integration of Artificial Intelligence and Theoretical Computer Science • Previous Articles     Next Articles

Automatic Theorem Proving Based on Pre-trained Language Models and Unification

CHEN Hongxiu, ZENG Xia, LIU Zhiming, ZHAO Hengjun   

  1. School of Computer and Information Science, School of Software, Southwest University, Chongqing 400715, China
  • Received:2025-10-16 Revised:2026-01-18 Online:2026-04-15 Published:2026-04-08
  • About author:CHEN Hongxiu,born in 2001,postgra-duate,is a member of CCF(No.A12584G).His main research interests include automatic theorem proving and natural language processing.
    ZHAO Hengjun,born in 1985,Ph.D,associate professor,is a member of CCF(No.50534M).His main research interest is software theory and methods.
  • Supported by:
    National Natural Science Foundation of China(62272397,62472362,92582113) and Natural Science Foundation of Chongqing,China(CSTB2025NSCQ-LZX0019).

Abstract: Pre-trained language models(PLMs) have demonstrated considerable potential in proving theorems within formal environments such as Metamath.However,this potential remains difficult to translate into reliable reasoning ability.Existing approaches typically require PLMs to directly predict substitutions,which come from an open and potentially infinite expression space.The absence of logical constraints in this process limits the generalization capability of the models.To address this challenge,a new proof paradigm is introduced:work variables temporarily substitute concrete terms during inference,and their specific instances are derived through the unification algorithm.This design enables the model to focus on theorem selection rather than generating unguided substitutions.Based on this paradigm,a dataset is extracted from the Metamath library to construct UnPro-ver(Unification-Driven Prover),which is trained on this data.In addition,several data augmentation strategies are designed to further enhance UnProver’s performance.Experimental results show that UnProver consistently outperforms baseline methods and GPT-4o on test sets,achieving superior performance in both proving capability and efficiency.Moreover,UnProver discove-ries six new and shorter proofs,which have been formally accepted into the official Metamath library.

Key words: Language models, Theorem proving, Metamath, Unification, Work variables

CLC Number: 

  • TP181
[1]DE MOURA L,KONG S,AVIGAD J,et al.The Lean theorem prover(system description)[C]//Automated Deduction-CADE-25.Berlin:Springer,2015:378-388.
[2]COQ P.The Coq proof assistant-reference manual[R].INRIA Rocquencourt and ENS Lyon,1996.
[3]NIPKOW T,WENZEL M,PAULSON L C.Isabelle/HOL:Aproof assistant for higher-order logic[M].Berlin:Springer,2002.
[4]POLU S,SUTSKEVER I.Generative language modeling for automated theorem proving[J].arXiv:2009.03393,2020.
[5]MEGILL N,WHEELER D A.Metamath:A computer language for mathematical proofs[M].Lulu.com,2019.
[6]LAMPLE G,LACROIX T,LACHAUX M A,et al.Hypertree proof search for neural theorem proving[C]//Advances in Neural Information Processing Systems 35.Curran Associates,2022:26337-26349.
[7]WANG M,DENG J.Learning to prove theorems by learning to generate theorems[C]//Advances in Neural Information Processing Systems 33.Curran Associates,2020:18146-18157.
[8]ROBINSON J A.A machine-oriented logic based on the resolution principle[J].Journal of the ACM,1965,12(1):23-41.
[9]BROWN T,MANN B,RYDER N,et al.Language models arefew-shot learners[C]//Advances in Neural Information Processing Systems 33.Curran Associates,2020:1877-1901.
[10]WANG Y,LIU X,SHI S.Deep neural solver for math wordproblems[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.2017:845-854.
[11]DUA D,WANG Y,DASIGI P,et al.DROP:A reading comprehension benchmark requiring discrete reasoning over paragraphs[J].arXiv:1903.00161,2019.
[12]LI Z,KOVACHKI N,AZIZZADENESHELI K,et al.Neuraloperator:Graph kernel network for partial differential equations[J].arXiv:2003.03485,2020.
[13]HAN J M,RUTE J,WU Y,et al.Proof artifact co-training for theorem proving with language models[J].arXiv:2102.06203,2021.
[14]WANG H,YUAN Y,LIU Z,et al.Dt-solver:Automated theorem proving with dynamic-tree sampling guided by proof-level value function[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.2023:12632-12646.
[15]POLU S,HAN J M,ZHENG K,et al.Formal mathematicsstatement curriculum learning[J].arXiv:2202.01344,2022.
[16]YANG K,SWOPE A,GU A,et al.Leandojo:Theorem proving with retrieval-augmented language models[C]//Advances in Neural Information Processing Systems 36.Curran Associates,2023:21573-21612.
[17]XIN R,XI C,YANG J,et al.BFS-Prover:Scalable best-first tree search for LLM-based automatic theorem proving[J].arXiv:2502.03438,2025.
[18]REN Z Z,SHAO Z,SONG J,et al.DeepSeek-Prover-V2:advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition[J].arXiv:2504.21801,2025.
[19]LI Z,SUN J,MURPHY L,et al.A survey on deep learning for theorem proving[J].arXiv:2404.09939,2024.
[20]WHALEN D.Holophrasm:a neural automated theorem prover for higher-order logic[J].arXiv:1608.02644,2016.
[21]LIN X,CAO Q,HUANG Y,et al.ATG:Benchmarking automated theorem generation for generative language models[J].ar-Xiv:2405.06677,2024.
[22]ENDERTON H B.A mathematical introduction to logic[M].San Diego:Elsevier,2001.
[23]AN C,CHEN Z,YE Q,et al.Learn from failure:Fine-tuning LLMs with trial-and-error data for intuitionistic propositional logic proving[J].arXiv:2404.07382,2024.
[24]XUE L,BARUA A,CONSTANT N,et al.ByT5:Towards a token-free future with pre-trained byte-to-byte models[J].Tran-sactions of the Association for Computational Linguistics,2022,10:291-306.
[25]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems 30.Curran Associates,2017.
[26]HURST A,LERER A,GOUCHER A P,et al.GPT-4o system card[J].arXiv:2410.21276,2024.
[1] HU Junjie, CHEN Yujie, HU Yikun, WEN Cheng, CAO Jialun, MA Zhi, SU Jie, SUN Weidi, TIAN Cong, QIN Shengchao. Formal Theorem Proving Empowered by Large Language Model:Survey and Perspectives [J]. Computer Science, 2026, 53(4): 1-23.
[2] LIU Suyi, LIU Qi, GAO Weibo. Agent4Stu:Efficient LLM-based Student Answer Behavior Simulation Agent [J]. Computer Science, 2026, 53(4): 347-355.
[3] XU Cheng, LIU Yuxuan, WANG Xin, ZHANG Cheng, YAO Dengfeng, YUAN Jiazheng. Review of Speech Disorder Assessment Methods Driven by Large Language Models [J]. Computer Science, 2026, 53(3): 307-320.
[4] LI Wenli, FENG Xiaonian, QIAN Tieyun. Few-shot Continuous Toxicity Detection Based on Large Language Model Augmentation [J]. Computer Science, 2026, 53(3): 321-330.
[5] LIU Lilong, LIU Guoming, QI Baoyuan, DENG Xueshan, XUE Dizhan, QIAN Shengsheng. Efficient Inference Techniques of Large Models in Real-world Applications:A Comprehensive Survey [J]. Computer Science, 2026, 53(1): 12-28.
[6] SHAO Xinyi, ZHU Jingwei, ZHANG Liang. LLM-based Business Process Adaptation Method to Respond Long-tailed Changes [J]. Computer Science, 2026, 53(1): 29-38.
[7] LI Maolin, LIN Jiajie, YANG Zhenguo. Confidence-guided Prompt Learning for Multimodal Aspect-level Sentiment Analysis [J]. Computer Science, 2025, 52(7): 241-247.
[8] CHEN Jinyin, XI Changkun, ZHENG Haibin, GAO Ming, ZHANG Tianxin. Survey of Security Research on Multimodal Large Language Models [J]. Computer Science, 2025, 52(7): 315-341.
[9] LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7.
[10] HU Caishun. Study on Named Entity Recognition Algorithms in Audit Domain Based on Large LanguageModels [J]. Computer Science, 2025, 52(6A): 240700190-4.
[11] GAO Hongkui, MA Ruixiang, BAO Qihao, XIA Shaojie, QU Chongxiao. Research on Hybrid Retrieval-augmented Dual-tower Model [J]. Computer Science, 2025, 52(6): 324-329.
[12] LI Hao, YANG Yumeng, ZHAO Boyang, ZHENG Puqi, LIN Hongfei. Adverse Drug Reaction Relationship Extraction Based on Chain of Thought Enhancement UnderHigh and Low Resources [J]. Computer Science, 2025, 52(12): 224-230.
[13] HUANG Haixin, XU Chenglong, FU Yao. Research on Structured Pruning Algorithm Based on Information Fusion [J]. Computer Science, 2025, 52(11A): 241000041-6.
[14] GUO Liwei, WU Yonghao, LIU Yong. Semantic Variations Based Defect Generation and Prediction Model Testing [J]. Computer Science, 2025, 52(11A): 241200059-7.
[15] PAN Jie, WANG Juan, WANG Nan. Large Language Models and Rumors:A Survey on Generation and Detection [J]. Computer Science, 2025, 52(11): 1-12.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!