计算机科学 ›› 2026, Vol. 53 ›› Issue (4): 40-47.doi: 10.11896/jsjkx.251000066

• 人工智能与理论计算机科学交叉融合 • 上一篇    下一篇

基于预训练语言模型和合一算法的自动定理证明

陈红休, 曾霞, 刘志明, 赵恒军   

  1. 西南大学计算机与信息科学学院 软件学院 重庆 400715
  • 收稿日期:2025-10-16 修回日期:2026-01-18 出版日期:2026-04-15 发布日期:2026-04-08
  • 通讯作者: 赵恒军(zhaohj2016@swu.edu.cn)
  • 作者简介:(chx666@email.swu.edu.cn)
  • 基金资助:
    国家自然科学基金(62272397,62472362,92582113);重庆市自然科学基金(CSTB2025NSCQ-LZX0019)

Automatic Theorem Proving Based on Pre-trained Language Models and Unification

CHEN Hongxiu, ZENG Xia, LIU Zhiming, ZHAO Hengjun   

  1. School of Computer and Information Science, School of Software, Southwest University, Chongqing 400715, China
  • Received:2025-10-16 Revised:2026-01-18 Published:2026-04-15 Online:2026-04-08
  • About author:CHEN Hongxiu,born in 2001,postgra-duate,is a member of CCF(No.A12584G).His main research interests include automatic theorem proving and natural language processing.
    ZHAO Hengjun,born in 1985,Ph.D,associate professor,is a member of CCF(No.50534M).His main research interest is software theory and methods.
  • Supported by:
    National Natural Science Foundation of China(62272397,62472362,92582113) and Natural Science Foundation of Chongqing,China(CSTB2025NSCQ-LZX0019).

摘要: 预训练语言模型在 Metamath 等形式化环境中,在证明定理方面展现出显著潜力。然而,这种潜力尚难以稳定转化为可靠的推理能力。现有方法通常要求语言模型直接预测替换项,而这些替换项来自一个开放且潜在无限的表达式空间,缺乏逻辑约束,导致模型泛化能力受限。为此,提出一种新的证明范式:在推理过程中引入工作变量以临时替代具体项,并通过合一算法推导出其具体实例。该设计使得模型能够聚焦于定理选择,而无需生成缺乏逻辑指导的替换。基于这一范式,从 Metamath 数学库中提取数据并构建 UnProver(Unification-Driven Prover),并在此数据上对其进行训练。此外,设计多种数据增强策略,进一步提升 UnProver 的性能。实验结果表明,UnProver 在测试集上整体优于包括 GPT-4o 在内的基线方法,在证明能力与效率方面均取得更优表现。此外,UnProver 还发现了 6 个新的、更简洁的证明,这些证明已被正式收录至 Metamath 官方数学库。

关键词: 语言模型, 定理证明, Metamath, 合一, 工作变量

Abstract: Pre-trained language models(PLMs) have demonstrated considerable potential in proving theorems within formal environments such as Metamath.However,this potential remains difficult to translate into reliable reasoning ability.Existing approaches typically require PLMs to directly predict substitutions,which come from an open and potentially infinite expression space.The absence of logical constraints in this process limits the generalization capability of the models.To address this challenge,a new proof paradigm is introduced:work variables temporarily substitute concrete terms during inference,and their specific instances are derived through the unification algorithm.This design enables the model to focus on theorem selection rather than generating unguided substitutions.Based on this paradigm,a dataset is extracted from the Metamath library to construct UnPro-ver(Unification-Driven Prover),which is trained on this data.In addition,several data augmentation strategies are designed to further enhance UnProver’s performance.Experimental results show that UnProver consistently outperforms baseline methods and GPT-4o on test sets,achieving superior performance in both proving capability and efficiency.Moreover,UnProver discove-ries six new and shorter proofs,which have been formally accepted into the official Metamath library.

Key words: Language models, Theorem proving, Metamath, Unification, Work variables

中图分类号: 

  • TP181
[1]DE MOURA L,KONG S,AVIGAD J,et al.The Lean theorem prover(system description)[C]//Automated Deduction-CADE-25.Berlin:Springer,2015:378-388.
[2]COQ P.The Coq proof assistant-reference manual[R].INRIA Rocquencourt and ENS Lyon,1996.
[3]NIPKOW T,WENZEL M,PAULSON L C.Isabelle/HOL:Aproof assistant for higher-order logic[M].Berlin:Springer,2002.
[4]POLU S,SUTSKEVER I.Generative language modeling for automated theorem proving[J].arXiv:2009.03393,2020.
[5]MEGILL N,WHEELER D A.Metamath:A computer language for mathematical proofs[M].Lulu.com,2019.
[6]LAMPLE G,LACROIX T,LACHAUX M A,et al.Hypertree proof search for neural theorem proving[C]//Advances in Neural Information Processing Systems 35.Curran Associates,2022:26337-26349.
[7]WANG M,DENG J.Learning to prove theorems by learning to generate theorems[C]//Advances in Neural Information Processing Systems 33.Curran Associates,2020:18146-18157.
[8]ROBINSON J A.A machine-oriented logic based on the resolution principle[J].Journal of the ACM,1965,12(1):23-41.
[9]BROWN T,MANN B,RYDER N,et al.Language models arefew-shot learners[C]//Advances in Neural Information Processing Systems 33.Curran Associates,2020:1877-1901.
[10]WANG Y,LIU X,SHI S.Deep neural solver for math wordproblems[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.2017:845-854.
[11]DUA D,WANG Y,DASIGI P,et al.DROP:A reading comprehension benchmark requiring discrete reasoning over paragraphs[J].arXiv:1903.00161,2019.
[12]LI Z,KOVACHKI N,AZIZZADENESHELI K,et al.Neuraloperator:Graph kernel network for partial differential equations[J].arXiv:2003.03485,2020.
[13]HAN J M,RUTE J,WU Y,et al.Proof artifact co-training for theorem proving with language models[J].arXiv:2102.06203,2021.
[14]WANG H,YUAN Y,LIU Z,et al.Dt-solver:Automated theorem proving with dynamic-tree sampling guided by proof-level value function[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.2023:12632-12646.
[15]POLU S,HAN J M,ZHENG K,et al.Formal mathematicsstatement curriculum learning[J].arXiv:2202.01344,2022.
[16]YANG K,SWOPE A,GU A,et al.Leandojo:Theorem proving with retrieval-augmented language models[C]//Advances in Neural Information Processing Systems 36.Curran Associates,2023:21573-21612.
[17]XIN R,XI C,YANG J,et al.BFS-Prover:Scalable best-first tree search for LLM-based automatic theorem proving[J].arXiv:2502.03438,2025.
[18]REN Z Z,SHAO Z,SONG J,et al.DeepSeek-Prover-V2:advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition[J].arXiv:2504.21801,2025.
[19]LI Z,SUN J,MURPHY L,et al.A survey on deep learning for theorem proving[J].arXiv:2404.09939,2024.
[20]WHALEN D.Holophrasm:a neural automated theorem prover for higher-order logic[J].arXiv:1608.02644,2016.
[21]LIN X,CAO Q,HUANG Y,et al.ATG:Benchmarking automated theorem generation for generative language models[J].ar-Xiv:2405.06677,2024.
[22]ENDERTON H B.A mathematical introduction to logic[M].San Diego:Elsevier,2001.
[23]AN C,CHEN Z,YE Q,et al.Learn from failure:Fine-tuning LLMs with trial-and-error data for intuitionistic propositional logic proving[J].arXiv:2404.07382,2024.
[24]XUE L,BARUA A,CONSTANT N,et al.ByT5:Towards a token-free future with pre-trained byte-to-byte models[J].Tran-sactions of the Association for Computational Linguistics,2022,10:291-306.
[25]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems 30.Curran Associates,2017.
[26]HURST A,LERER A,GOUCHER A P,et al.GPT-4o system card[J].arXiv:2410.21276,2024.
[1] 刘苏熠, 刘淇, 高维博.
Agent4Stu:基于大语言模型的学生作答行为高效模拟智能体
Agent4Stu:Efficient LLM-based Student Answer Behavior Simulation Agent
计算机科学, 2026, 53(4): 347-355. https://doi.org/10.11896/jsjkx.250800012
[2] 吴荞睿, 罗丽, 赵才荣.
基于循环一致性约束的LLM增强型语言模型训练框架
LLM-augmented Training Framework with Cycle-Consistency Constraints
计算机科学, 2026, 53(4): 377-383. https://doi.org/10.11896/jsjkx.250600032
[3] 徐亚敏, 李晓斌, 张润.
基于点态流形与一致正则的半监督学习算法
Semi-supervised Learning Algorithm Based on Pointwise Manifold Structures and Uniform Regularity Constraints
计算机科学, 2026, 53(4): 173-179. https://doi.org/10.11896/jsjkx.250300086
[4] 胡俊杰, 陈宇杰, 胡义坤, 文成, 曹嘉伦, 马智, 苏杰, 孙纬地, 田聪, 秦胜潮.
大模型驱动的形式化定理证明:综述与展望
Formal Theorem Proving Empowered by Large Language Model:Survey and Perspectives
计算机科学, 2026, 53(4): 1-23. https://doi.org/10.11896/jsjkx.251000067
[5] 吴贤杰, 李彤亮, 李舟军.
表格问答研究综述
Survey of Table Question Answering Research
计算机科学, 2026, 53(3): 295-306. https://doi.org/10.11896/jsjkx.250900006
[6] 徐成, 刘宇轩, 王欣, 张铖, 姚登峰, 袁家政.
大语言模型驱动的言语障碍评估方法综述
Review of Speech Disorder Assessment Methods Driven by Large Language Models
计算机科学, 2026, 53(3): 307-320. https://doi.org/10.11896/jsjkx.250300125
[7] 李雯莉, 冯小年, 钱铁云.
基于大型语言模型增广的少样本持续毒性检测
Few-shot Continuous Toxicity Detection Based on Large Language Model Augmentation
计算机科学, 2026, 53(3): 321-330. https://doi.org/10.11896/jsjkx.250600010
[8] 王智彬, 李世鹏, 周宇航, 李雪, 张中辉, 蒋智威, 顾荣, 田臣, 陈贵海, 仲盛.
大语言模型服务系统服务级目标和系统级指标优化
Optimization of Service Level Objectives and System Level Metrics in Large Language ModelServing System
计算机科学, 2026, 53(3): 23-32. https://doi.org/10.11896/jsjkx.250900173
[9] 周悦媛, 卢冠泽, 向佳为, 章家维, 邵恩, 何鑫.
面向海光DCU基于自适应转置的大语言模型训练系统
Training System for Large Language Models Based on Adaptive Transpose on Hygon DCU
计算机科学, 2026, 53(3): 33-40. https://doi.org/10.11896/jsjkx.250600073
[10] 陈涵, 徐泽锋, 蒋究, 樊凡, 章军建, 何楚, 王文伟.
基于大语言模型和深度网络的认知评估量表自动诊断
Large Language Model and Deep Network Based Cognitive Assessment Automatic Diagnosis
计算机科学, 2026, 53(3): 41-51. https://doi.org/10.11896/jsjkx.250600034
[11] 陈昱胤, 李贯峰, 秦晶, 肖毓航.
知识图谱的复杂逻辑查询方法研究综述
Survey on Complex Logical Query Methods in Knowledge Graphs
计算机科学, 2026, 53(2): 273-288. https://doi.org/10.11896/jsjkx.250400033
[12] 杨明, 贺超波, 杨佳琦.
基于方向感知孪生网络的知识概念先序关系预测方法
Direction-aware Siamese Network for Knowledge Concept Prerequisite Relation Prediction
计算机科学, 2026, 53(2): 39-47. https://doi.org/10.11896/jsjkx.250600005
[13] 郭陆祥, 王越余, 李芊玥, 李莎莎, 刘晓东, 纪斌, 余杰.
大语言模型智能体操作系统研究综述
Comprehensive Survey of LLM-based Agent Operating Systems
计算机科学, 2026, 53(1): 1-11. https://doi.org/10.11896/jsjkx.250500002
[14] 刘利龙, 刘国明, 齐保元, 邓雪杉, 薛迪展, 钱胜胜.
实际应用场景中的大模型高效推理技术综述
Efficient Inference Techniques of Large Models in Real-world Applications:A Comprehensive Survey
计算机科学, 2026, 53(1): 12-28. https://doi.org/10.11896/jsjkx.250300030
[15] 邵欣怡, 朱经纬, 张亮.
基于大语言模型的业务流程长尾变化应变方法
LLM-based Business Process Adaptation Method to Respond Long-tailed Changes
计算机科学, 2026, 53(1): 29-38. https://doi.org/10.11896/jsjkx.250100001
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!