计算机科学 ›› 2026, Vol. 53 ›› Issue (4): 33-39.doi: 10.11896/jsjkx.250600129

• 人工智能与理论计算机科学交叉融合 • 上一篇    下一篇

深度泛化机制的再思考:过参数化与高维噪声扰动下的一致收敛界重构

李鹏奇, 丁立中, 张春晖, 傅稼润   

  1. 北京理工大学计算机学院 北京 100081
  • 收稿日期:2025-06-20 修回日期:2026-01-10 出版日期:2026-04-15 发布日期:2026-04-08
  • 通讯作者: 丁立中(lizhong.ding@outlook.com)
  • 作者简介:(pengqi.li@bit.edu.cn)
  • 基金资助:
    国家重点研发计划(2022YFB2703100);国家自然科学基金(62376028,U22A2099);国家自然科学基金优秀青年科学基金(海外)

Rethinking Deep Generalization Mechanisms:Establishment of Uniform Convergence Bounds Under Overparameterization and High-dimensional Noise Perturbations

LI Pengqi, DING Lizhong, ZHANG Chunhui, FU Jiarun   

  1. School of Computer Science, Beijing Institute of Technology, Beijing 100081, China
  • Received:2025-06-20 Revised:2026-01-10 Published:2026-04-15 Online:2026-04-08
  • About author:LI Pengqi,born in 2002,Ph.D candidate.His main research interests include deep statistical learning theory,kernel learning,uncertainty in large language models and physical AI.
    DING Lizhong,born in 1986,professor.His main research interests include deep statistical learning theory and methods,the emergence mechanisms and reasoning mechanisms of large-scale models,statistical hypothesis testing and deep generative models,and neural-symbolic learning theory and methods.
  • Supported by:
    National Key Research and Development Program of China(2022YFB2703100),National Natural Science Foundation of China(62376028,U22A2099) and Excellent Young Scientists Fund(Overseas) of the National Natural Science Foundation of China.

摘要: 深度神经网络在具备强大的表达能力的同时展现出优异的泛化性能,这与统计学习理论中“模型复杂度损害泛化”的经典论断存在本质冲突,导致传统框架下的深度泛化机制分析陷入困境。经典一致收敛界理论具有依赖参数空间维度、忽略算法隐式偏置等局限,难以直接适配深度网络核心特性。针对这一理论裂隙,构建了融合深度模型关键特征的新型统计学习理论框架,重构了一致收敛理论对深度模型泛化机制的解释范式。通过构建保留深度网络过参数化结构与高维噪声扰动特征的代理线性模型,首次推导出有效的一致收敛界,揭示了高维特征空间中噪声扰动对泛化性能的良性作用机制,突破了传统低维学习理论框架的局限性;基于深度泛化机制构造了数据规模敏感的规范化训练过程,揭示一致收敛界与泛化误差随样本复杂度增长呈现同步衰减的规律,证实了一致收敛理论对深度模型泛化机制的解释能力。基于理论与实验双重证据,突破了一致收敛泛化界的适配瓶颈,重新打开了一致收敛理论分析深度模型泛化性这扇即将被关闭的大门。

关键词: 泛化误差, 一致收敛界, 修剪假设空间, 高维概率, 泛化机制

Abstract: Deep neural networks demonstrate both powerful expressive capabilities and exceptional generalization performance,which fundamentally conflicts with the classical statistical learning tenet that “model complexity harms generalization”,rendering the analysis of deep generalization mechanisms under traditional frameworks intractable.Classic uniform convergence theory,constrained by its reliance on parameter space dimensionality and neglect of algorithmic implicit bias,fails to directly align with the core characteristics of deep networks.To address this theoretical gap,this paper constructs a novel statistical learning framework that integrates key features of deep models,thereby redefining the explanatory paradigm of uniform convergence theory for deep generalization mechanisms.It derives the first effective uniform convergence bound for deep networks by introducing a surrogate linear model that preserves overparameterization and high-dimensional noise-perturbation features,which reveals a benign role of high-dimensional noise in improving generalization beyond classical low-dimensional theory.Building on this deep generalization mechanism,it further proposes a scale-sensitive regularized training scheme and shows that the bound and the generalization error decay with increasing sample complexity.Supported by both theoretical and empirical evidence,this work breaks through the adaptability bottleneck of uniform convergence bounds and reopens the door for uniform convergence theory to analyze the generalization of deep models.

Key words: Generalization error, Uniform convergence bound, Pruned hypothesis space, High-dimensional probability, Generalization mechanism

中图分类号: 

  • TP391
[1]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(9):1992-2001.
[2]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:5998-6008.
[3]BROWN T B,MANN B,RYDER N,et al.Language models are few-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:18779-18797.
[4]BELLEC P H,BOUSQUET O,GUEDJ B,et al.Reconcilingmodern machine learning practice and the bias-variance trade-off[C]//Proceedings of the 26th International Conference on Artificial Intelligence and Statistics(AISTATS).PMLR,2023:2358-2376.
[5]CYBENKO G.Approximation by superpositions of a sigmoidal function[J].Mathematics of Control,Signals and Systems,1989,2(4):303-314.
[6]ERINGIS D,HOFMANN T,RAKHLIN A.PAC-Bayes gene-ralisation bounds for dynamical systems including stable RNNs[J].Proceedings of the AAAI Conference on Artificial Intelligence,2024,38(11):11901-11909.
[7]DZIUGAITE G K,ROY D M.Computing non-vacuous generalization bounds for deep(stochastic) neural networks with many more parameters than training data[J].arXiv:1703.11008,2017.
[8]BARTLETT P L,FOSTER D J,TELGARSKY M J.Spectrally-normalized margin bounds for neural networks[C]//Advances in Neural Information Processing Systems.California:NIPS,2017:6241-6250.
[9]BARTLETT P L,MENDELSON S.Rademacher and Gaussian complexities:risk bounds and structural results[J].Journal of Machine Learning Research,2002,3(11):463-482.
[10]SACHS S,OBRIST R,SIMCHI-LEVI D,et al.Generalizationguarantees via algorithm-dependent Rademacher complexity[C]//Proceedings of the 36th Annual Conference on Learning Theory.Bangalore:COLT,2023:4863-4880.
[11]DU S S,LEE J D,LI H,et al.Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks[C]//Proceedings of the 36th International Conference on Machine Learning.Long Beach,CA:PMLR,2019:1617-1626.
[12]HARDT M,RECHT B,SINGER Y.Train faster,generalizebetter:stability of stochastic gradient descent[C]//Proceedings of the 33rd International Conference on Machine Learning.New York:PMLR,2016:1225-1234.
[13]MOU W,WANG L,ZOU D,et al.Generalization bounds ofSGLD for non-convex learning:two theoretical viewpoints[J].arXiv:1707.05947,2017.
[14]ATTIA A,KOREN T.Uniform stability for first-order empirical risk minimization[C]//Proceedings of the Thirty-Fifth Conference on Learning Theory.PMLR,2022:3313-3332.
[15]JIANG Y,NASSER Y,RAGHAVAN D,et al.NeurIPS 2020 competition:predicting generalization in deep learning[C]//Proceedings of the NeurIPS 2020 Competition and Demonstration Track.Vancouver,BC:NeurIPS,2020:170-190.
[16]ADVANI M S,SAXE A M.High-dimensional dynamics of gene-ralization error in neural networks[J].arXiv:1710.03667,2017.
[17]NAGARAJAN V,KOLTER J Z.Uniform convergence may be unable to explain generalization in deep learning[C]//Procee-dings of the 33st International Conference on Neural Information Processing Systems.2019:11615-11626.
[18]KAWAGUCHI K,ZHANG L,BENGIO Y,et al.How does information bottleneck help deep learning?[C]//International Conference on Machine Learning(ICML).ICML,2023.
[19]ARORA S,GE R,NEGDY A,et al.Stronger generalizationbounds for deep nets via a compression approach[C]//Procee-dings of the 35th International Conference on Machine Lear-ning.PMLR,2018:3597-3606.
[20]CHATTERJEE S,ZIELINSKI P.On the generalization mystery in deep learning[J].arXiv:2203.10036,2022.
[21]DWORK C,ROTH A.The algorithmic foundations of differential privacy[J].Foundations and Trends in Theoretical Compu-ter Science,2014,9(3/4):211-407.
[22]VASILEIOU A,JEGELKA S,LEVIE R,et al.Survey on gene-ralization theory for graph neural networks[J].arXiv:2503.15650,2025.
[23]SUN T,LIN J.PAC-Bayesian adversarially robust generalization bounds for graph neural network[J].arXiv:2402.04038,2024.
[24]NEYSHABUR B,TOMIOKA R,SREBRO N.In search of the real inductive bias:on the role of implicit regularization in deep learning[C]//Proceedings of the 3rd International Conference on Learning Representations(ICLR).ICLR,2015.
[25]ZHANG C Y,BENGIO S,HARDT M,et al.Understandingdeep learning requires rethinking generalization[C]//Procee-dings of the 5th International Conference on Learning Representations(ICLR).ICLR,2017.
[26]ZHOU W,SUN Q,POGGIO T,et al.Non-vacuous generalization bounds at the ImageNet scale:a PAC-Bayesian compression approach[C]//Proceedings of the 7th International Conference on Learning Representations(ICLR).New York:ICLR,2019:1-14.
[27]NAGARAJAN V,KOLTER J Z.Generalization in deep net-works:the role of distance from initialization[J].arXiv:1901.01672,2019.
[28]YUN C,BHOJANAPALLI S,RAWAT A S,et al.Are trans-formers universal approximators of sequence-to-sequence functions?[J].arXiv:1912.10077,2019.
[29]CORTES C,VAPNIK V.Support-vector networks[J].Machine Learning,1995,20:273-297.
[30]BISHOP C M,NASRABADI N M.Pattern recognition and machine learning[M].New York:Springer,2006.
[31]HOSMER D W,LEMESHOW S,STURDIVANT R X.Applied logistic regression[M].John Wiley & Sons,2013.
[32]REN R F,LIU Y.Towards understanding how transformerslearn in-context through a representation learning lens[J].Advances in Neural Information Processing Systems,2025,37:892-933.
[33]JACOT A,GABRIEL F,HONGLER C.Neural tangent kernel:convergence and generalization in neural networks[C]//Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing.2018.
[34]TAN L,WU S,ZHOU W,et al.Weighted neural tangent kernel:a generalized and improved network-induced kernel[J].Machine Learning,2023,112(8):2871-2901.
[35]JI Z,TELGARSKY M.Polylogarithmic width suffices for gra-dient descent to achieve arbitrarily small test error with shallow ReLU networks[J].arXiv:1909.12292,2019.
[36]VERSHYNIN R.High-dimensional probability:an introduction with applications in data science[M].Cambridge:Cambridge University Press,2018.
[37]ZHANG X H,LI M,WANG J,et al.A joint training framework for learning under label noise[J].Journal of Computer Research and Development,2022,59(10):2021-2035.
[38]NEYSHABUR B,BHATIA K,BARTLETT P L,et al.A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks[J].arXiv:1707.09564,2017.
[39]NEYSHABUR B,TOMIOKA R,SREBRO N.Towards understanding the role of over-parametrization in generalization of neural networks[C]//Proceedings of the 35th International Conference on Machine Learning.PMLR,2018:3784-3793.
[1] 姜高霞, 王菲, 许行, 王文剑.
有序标签噪声的鲁棒估计与过滤方法
Robust Estimation and Filtering Methods for Ordinal Label Noise
计算机科学, 2024, 51(6): 144-152. https://doi.org/10.11896/jsjkx.230700115
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!