计算机科学 ›› 2026, Vol. 53 ›› Issue (4): 40-47.doi: 10.11896/jsjkx.251000066
陈红休, 曾霞, 刘志明, 赵恒军
CHEN Hongxiu, ZENG Xia, LIU Zhiming, ZHAO Hengjun
摘要: 预训练语言模型在 Metamath 等形式化环境中,在证明定理方面展现出显著潜力。然而,这种潜力尚难以稳定转化为可靠的推理能力。现有方法通常要求语言模型直接预测替换项,而这些替换项来自一个开放且潜在无限的表达式空间,缺乏逻辑约束,导致模型泛化能力受限。为此,提出一种新的证明范式:在推理过程中引入工作变量以临时替代具体项,并通过合一算法推导出其具体实例。该设计使得模型能够聚焦于定理选择,而无需生成缺乏逻辑指导的替换。基于这一范式,从 Metamath 数学库中提取数据并构建 UnProver(Unification-Driven Prover),并在此数据上对其进行训练。此外,设计多种数据增强策略,进一步提升 UnProver 的性能。实验结果表明,UnProver 在测试集上整体优于包括 GPT-4o 在内的基线方法,在证明能力与效率方面均取得更优表现。此外,UnProver 还发现了 6 个新的、更简洁的证明,这些证明已被正式收录至 Metamath 官方数学库。
中图分类号:
| [1]DE MOURA L,KONG S,AVIGAD J,et al.The Lean theorem prover(system description)[C]//Automated Deduction-CADE-25.Berlin:Springer,2015:378-388. [2]COQ P.The Coq proof assistant-reference manual[R].INRIA Rocquencourt and ENS Lyon,1996. [3]NIPKOW T,WENZEL M,PAULSON L C.Isabelle/HOL:Aproof assistant for higher-order logic[M].Berlin:Springer,2002. [4]POLU S,SUTSKEVER I.Generative language modeling for automated theorem proving[J].arXiv:2009.03393,2020. [5]MEGILL N,WHEELER D A.Metamath:A computer language for mathematical proofs[M].Lulu.com,2019. [6]LAMPLE G,LACROIX T,LACHAUX M A,et al.Hypertree proof search for neural theorem proving[C]//Advances in Neural Information Processing Systems 35.Curran Associates,2022:26337-26349. [7]WANG M,DENG J.Learning to prove theorems by learning to generate theorems[C]//Advances in Neural Information Processing Systems 33.Curran Associates,2020:18146-18157. [8]ROBINSON J A.A machine-oriented logic based on the resolution principle[J].Journal of the ACM,1965,12(1):23-41. [9]BROWN T,MANN B,RYDER N,et al.Language models arefew-shot learners[C]//Advances in Neural Information Processing Systems 33.Curran Associates,2020:1877-1901. [10]WANG Y,LIU X,SHI S.Deep neural solver for math wordproblems[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.2017:845-854. [11]DUA D,WANG Y,DASIGI P,et al.DROP:A reading comprehension benchmark requiring discrete reasoning over paragraphs[J].arXiv:1903.00161,2019. [12]LI Z,KOVACHKI N,AZIZZADENESHELI K,et al.Neuraloperator:Graph kernel network for partial differential equations[J].arXiv:2003.03485,2020. [13]HAN J M,RUTE J,WU Y,et al.Proof artifact co-training for theorem proving with language models[J].arXiv:2102.06203,2021. [14]WANG H,YUAN Y,LIU Z,et al.Dt-solver:Automated theorem proving with dynamic-tree sampling guided by proof-level value function[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.2023:12632-12646. [15]POLU S,HAN J M,ZHENG K,et al.Formal mathematicsstatement curriculum learning[J].arXiv:2202.01344,2022. [16]YANG K,SWOPE A,GU A,et al.Leandojo:Theorem proving with retrieval-augmented language models[C]//Advances in Neural Information Processing Systems 36.Curran Associates,2023:21573-21612. [17]XIN R,XI C,YANG J,et al.BFS-Prover:Scalable best-first tree search for LLM-based automatic theorem proving[J].arXiv:2502.03438,2025. [18]REN Z Z,SHAO Z,SONG J,et al.DeepSeek-Prover-V2:advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition[J].arXiv:2504.21801,2025. [19]LI Z,SUN J,MURPHY L,et al.A survey on deep learning for theorem proving[J].arXiv:2404.09939,2024. [20]WHALEN D.Holophrasm:a neural automated theorem prover for higher-order logic[J].arXiv:1608.02644,2016. [21]LIN X,CAO Q,HUANG Y,et al.ATG:Benchmarking automated theorem generation for generative language models[J].ar-Xiv:2405.06677,2024. [22]ENDERTON H B.A mathematical introduction to logic[M].San Diego:Elsevier,2001. [23]AN C,CHEN Z,YE Q,et al.Learn from failure:Fine-tuning LLMs with trial-and-error data for intuitionistic propositional logic proving[J].arXiv:2404.07382,2024. [24]XUE L,BARUA A,CONSTANT N,et al.ByT5:Towards a token-free future with pre-trained byte-to-byte models[J].Tran-sactions of the Association for Computational Linguistics,2022,10:291-306. [25]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems 30.Curran Associates,2017. [26]HURST A,LERER A,GOUCHER A P,et al.GPT-4o system card[J].arXiv:2410.21276,2024. |
| [1] | 刘苏熠, 刘淇, 高维博. Agent4Stu:基于大语言模型的学生作答行为高效模拟智能体 Agent4Stu:Efficient LLM-based Student Answer Behavior Simulation Agent 计算机科学, 2026, 53(4): 347-355. https://doi.org/10.11896/jsjkx.250800012 |
| [2] | 吴荞睿, 罗丽, 赵才荣. 基于循环一致性约束的LLM增强型语言模型训练框架 LLM-augmented Training Framework with Cycle-Consistency Constraints 计算机科学, 2026, 53(4): 377-383. https://doi.org/10.11896/jsjkx.250600032 |
| [3] | 徐亚敏, 李晓斌, 张润. 基于点态流形与一致正则的半监督学习算法 Semi-supervised Learning Algorithm Based on Pointwise Manifold Structures and Uniform Regularity Constraints 计算机科学, 2026, 53(4): 173-179. https://doi.org/10.11896/jsjkx.250300086 |
| [4] | 胡俊杰, 陈宇杰, 胡义坤, 文成, 曹嘉伦, 马智, 苏杰, 孙纬地, 田聪, 秦胜潮. 大模型驱动的形式化定理证明:综述与展望 Formal Theorem Proving Empowered by Large Language Model:Survey and Perspectives 计算机科学, 2026, 53(4): 1-23. https://doi.org/10.11896/jsjkx.251000067 |
| [5] | 吴贤杰, 李彤亮, 李舟军. 表格问答研究综述 Survey of Table Question Answering Research 计算机科学, 2026, 53(3): 295-306. https://doi.org/10.11896/jsjkx.250900006 |
| [6] | 徐成, 刘宇轩, 王欣, 张铖, 姚登峰, 袁家政. 大语言模型驱动的言语障碍评估方法综述 Review of Speech Disorder Assessment Methods Driven by Large Language Models 计算机科学, 2026, 53(3): 307-320. https://doi.org/10.11896/jsjkx.250300125 |
| [7] | 李雯莉, 冯小年, 钱铁云. 基于大型语言模型增广的少样本持续毒性检测 Few-shot Continuous Toxicity Detection Based on Large Language Model Augmentation 计算机科学, 2026, 53(3): 321-330. https://doi.org/10.11896/jsjkx.250600010 |
| [8] | 王智彬, 李世鹏, 周宇航, 李雪, 张中辉, 蒋智威, 顾荣, 田臣, 陈贵海, 仲盛. 大语言模型服务系统服务级目标和系统级指标优化 Optimization of Service Level Objectives and System Level Metrics in Large Language ModelServing System 计算机科学, 2026, 53(3): 23-32. https://doi.org/10.11896/jsjkx.250900173 |
| [9] | 周悦媛, 卢冠泽, 向佳为, 章家维, 邵恩, 何鑫. 面向海光DCU基于自适应转置的大语言模型训练系统 Training System for Large Language Models Based on Adaptive Transpose on Hygon DCU 计算机科学, 2026, 53(3): 33-40. https://doi.org/10.11896/jsjkx.250600073 |
| [10] | 陈涵, 徐泽锋, 蒋究, 樊凡, 章军建, 何楚, 王文伟. 基于大语言模型和深度网络的认知评估量表自动诊断 Large Language Model and Deep Network Based Cognitive Assessment Automatic Diagnosis 计算机科学, 2026, 53(3): 41-51. https://doi.org/10.11896/jsjkx.250600034 |
| [11] | 陈昱胤, 李贯峰, 秦晶, 肖毓航. 知识图谱的复杂逻辑查询方法研究综述 Survey on Complex Logical Query Methods in Knowledge Graphs 计算机科学, 2026, 53(2): 273-288. https://doi.org/10.11896/jsjkx.250400033 |
| [12] | 杨明, 贺超波, 杨佳琦. 基于方向感知孪生网络的知识概念先序关系预测方法 Direction-aware Siamese Network for Knowledge Concept Prerequisite Relation Prediction 计算机科学, 2026, 53(2): 39-47. https://doi.org/10.11896/jsjkx.250600005 |
| [13] | 郭陆祥, 王越余, 李芊玥, 李莎莎, 刘晓东, 纪斌, 余杰. 大语言模型智能体操作系统研究综述 Comprehensive Survey of LLM-based Agent Operating Systems 计算机科学, 2026, 53(1): 1-11. https://doi.org/10.11896/jsjkx.250500002 |
| [14] | 刘利龙, 刘国明, 齐保元, 邓雪杉, 薛迪展, 钱胜胜. 实际应用场景中的大模型高效推理技术综述 Efficient Inference Techniques of Large Models in Real-world Applications:A Comprehensive Survey 计算机科学, 2026, 53(1): 12-28. https://doi.org/10.11896/jsjkx.250300030 |
| [15] | 邵欣怡, 朱经纬, 张亮. 基于大语言模型的业务流程长尾变化应变方法 LLM-based Business Process Adaptation Method to Respond Long-tailed Changes 计算机科学, 2026, 53(1): 29-38. https://doi.org/10.11896/jsjkx.250100001 |
|
||