Computer Science ›› 2026, Vol. 53 ›› Issue (4): 33-39.doi: 10.11896/jsjkx.250600129

• Interdisciplinary Integration of Artificial Intelligence and Theoretical Computer Science • Previous Articles     Next Articles

Rethinking Deep Generalization Mechanisms:Establishment of Uniform Convergence Bounds Under Overparameterization and High-dimensional Noise Perturbations

LI Pengqi, DING Lizhong, ZHANG Chunhui, FU Jiarun   

  1. School of Computer Science, Beijing Institute of Technology, Beijing 100081, China
  • Received:2025-06-20 Revised:2026-01-10 Online:2026-04-15 Published:2026-04-08
  • About author:LI Pengqi,born in 2002,Ph.D candidate.His main research interests include deep statistical learning theory,kernel learning,uncertainty in large language models and physical AI.
    DING Lizhong,born in 1986,professor.His main research interests include deep statistical learning theory and methods,the emergence mechanisms and reasoning mechanisms of large-scale models,statistical hypothesis testing and deep generative models,and neural-symbolic learning theory and methods.
  • Supported by:
    National Key Research and Development Program of China(2022YFB2703100),National Natural Science Foundation of China(62376028,U22A2099) and Excellent Young Scientists Fund(Overseas) of the National Natural Science Foundation of China.

Abstract: Deep neural networks demonstrate both powerful expressive capabilities and exceptional generalization performance,which fundamentally conflicts with the classical statistical learning tenet that “model complexity harms generalization”,rendering the analysis of deep generalization mechanisms under traditional frameworks intractable.Classic uniform convergence theory,constrained by its reliance on parameter space dimensionality and neglect of algorithmic implicit bias,fails to directly align with the core characteristics of deep networks.To address this theoretical gap,this paper constructs a novel statistical learning framework that integrates key features of deep models,thereby redefining the explanatory paradigm of uniform convergence theory for deep generalization mechanisms.It derives the first effective uniform convergence bound for deep networks by introducing a surrogate linear model that preserves overparameterization and high-dimensional noise-perturbation features,which reveals a benign role of high-dimensional noise in improving generalization beyond classical low-dimensional theory.Building on this deep generalization mechanism,it further proposes a scale-sensitive regularized training scheme and shows that the bound and the generalization error decay with increasing sample complexity.Supported by both theoretical and empirical evidence,this work breaks through the adaptability bottleneck of uniform convergence bounds and reopens the door for uniform convergence theory to analyze the generalization of deep models.

Key words: Generalization error, Uniform convergence bound, Pruned hypothesis space, High-dimensional probability, Generalization mechanism

CLC Number: 

  • TP391
[1]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(9):1992-2001.
[2]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:5998-6008.
[3]BROWN T B,MANN B,RYDER N,et al.Language models are few-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:18779-18797.
[4]BELLEC P H,BOUSQUET O,GUEDJ B,et al.Reconcilingmodern machine learning practice and the bias-variance trade-off[C]//Proceedings of the 26th International Conference on Artificial Intelligence and Statistics(AISTATS).PMLR,2023:2358-2376.
[5]CYBENKO G.Approximation by superpositions of a sigmoidal function[J].Mathematics of Control,Signals and Systems,1989,2(4):303-314.
[6]ERINGIS D,HOFMANN T,RAKHLIN A.PAC-Bayes gene-ralisation bounds for dynamical systems including stable RNNs[J].Proceedings of the AAAI Conference on Artificial Intelligence,2024,38(11):11901-11909.
[7]DZIUGAITE G K,ROY D M.Computing non-vacuous generalization bounds for deep(stochastic) neural networks with many more parameters than training data[J].arXiv:1703.11008,2017.
[8]BARTLETT P L,FOSTER D J,TELGARSKY M J.Spectrally-normalized margin bounds for neural networks[C]//Advances in Neural Information Processing Systems.California:NIPS,2017:6241-6250.
[9]BARTLETT P L,MENDELSON S.Rademacher and Gaussian complexities:risk bounds and structural results[J].Journal of Machine Learning Research,2002,3(11):463-482.
[10]SACHS S,OBRIST R,SIMCHI-LEVI D,et al.Generalizationguarantees via algorithm-dependent Rademacher complexity[C]//Proceedings of the 36th Annual Conference on Learning Theory.Bangalore:COLT,2023:4863-4880.
[11]DU S S,LEE J D,LI H,et al.Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks[C]//Proceedings of the 36th International Conference on Machine Learning.Long Beach,CA:PMLR,2019:1617-1626.
[12]HARDT M,RECHT B,SINGER Y.Train faster,generalizebetter:stability of stochastic gradient descent[C]//Proceedings of the 33rd International Conference on Machine Learning.New York:PMLR,2016:1225-1234.
[13]MOU W,WANG L,ZOU D,et al.Generalization bounds ofSGLD for non-convex learning:two theoretical viewpoints[J].arXiv:1707.05947,2017.
[14]ATTIA A,KOREN T.Uniform stability for first-order empirical risk minimization[C]//Proceedings of the Thirty-Fifth Conference on Learning Theory.PMLR,2022:3313-3332.
[15]JIANG Y,NASSER Y,RAGHAVAN D,et al.NeurIPS 2020 competition:predicting generalization in deep learning[C]//Proceedings of the NeurIPS 2020 Competition and Demonstration Track.Vancouver,BC:NeurIPS,2020:170-190.
[16]ADVANI M S,SAXE A M.High-dimensional dynamics of gene-ralization error in neural networks[J].arXiv:1710.03667,2017.
[17]NAGARAJAN V,KOLTER J Z.Uniform convergence may be unable to explain generalization in deep learning[C]//Procee-dings of the 33st International Conference on Neural Information Processing Systems.2019:11615-11626.
[18]KAWAGUCHI K,ZHANG L,BENGIO Y,et al.How does information bottleneck help deep learning?[C]//International Conference on Machine Learning(ICML).ICML,2023.
[19]ARORA S,GE R,NEGDY A,et al.Stronger generalizationbounds for deep nets via a compression approach[C]//Procee-dings of the 35th International Conference on Machine Lear-ning.PMLR,2018:3597-3606.
[20]CHATTERJEE S,ZIELINSKI P.On the generalization mystery in deep learning[J].arXiv:2203.10036,2022.
[21]DWORK C,ROTH A.The algorithmic foundations of differential privacy[J].Foundations and Trends in Theoretical Compu-ter Science,2014,9(3/4):211-407.
[22]VASILEIOU A,JEGELKA S,LEVIE R,et al.Survey on gene-ralization theory for graph neural networks[J].arXiv:2503.15650,2025.
[23]SUN T,LIN J.PAC-Bayesian adversarially robust generalization bounds for graph neural network[J].arXiv:2402.04038,2024.
[24]NEYSHABUR B,TOMIOKA R,SREBRO N.In search of the real inductive bias:on the role of implicit regularization in deep learning[C]//Proceedings of the 3rd International Conference on Learning Representations(ICLR).ICLR,2015.
[25]ZHANG C Y,BENGIO S,HARDT M,et al.Understandingdeep learning requires rethinking generalization[C]//Procee-dings of the 5th International Conference on Learning Representations(ICLR).ICLR,2017.
[26]ZHOU W,SUN Q,POGGIO T,et al.Non-vacuous generalization bounds at the ImageNet scale:a PAC-Bayesian compression approach[C]//Proceedings of the 7th International Conference on Learning Representations(ICLR).New York:ICLR,2019:1-14.
[27]NAGARAJAN V,KOLTER J Z.Generalization in deep net-works:the role of distance from initialization[J].arXiv:1901.01672,2019.
[28]YUN C,BHOJANAPALLI S,RAWAT A S,et al.Are trans-formers universal approximators of sequence-to-sequence functions?[J].arXiv:1912.10077,2019.
[29]CORTES C,VAPNIK V.Support-vector networks[J].Machine Learning,1995,20:273-297.
[30]BISHOP C M,NASRABADI N M.Pattern recognition and machine learning[M].New York:Springer,2006.
[31]HOSMER D W,LEMESHOW S,STURDIVANT R X.Applied logistic regression[M].John Wiley & Sons,2013.
[32]REN R F,LIU Y.Towards understanding how transformerslearn in-context through a representation learning lens[J].Advances in Neural Information Processing Systems,2025,37:892-933.
[33]JACOT A,GABRIEL F,HONGLER C.Neural tangent kernel:convergence and generalization in neural networks[C]//Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing.2018.
[34]TAN L,WU S,ZHOU W,et al.Weighted neural tangent kernel:a generalized and improved network-induced kernel[J].Machine Learning,2023,112(8):2871-2901.
[35]JI Z,TELGARSKY M.Polylogarithmic width suffices for gra-dient descent to achieve arbitrarily small test error with shallow ReLU networks[J].arXiv:1909.12292,2019.
[36]VERSHYNIN R.High-dimensional probability:an introduction with applications in data science[M].Cambridge:Cambridge University Press,2018.
[37]ZHANG X H,LI M,WANG J,et al.A joint training framework for learning under label noise[J].Journal of Computer Research and Development,2022,59(10):2021-2035.
[38]NEYSHABUR B,BHATIA K,BARTLETT P L,et al.A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks[J].arXiv:1707.09564,2017.
[39]NEYSHABUR B,TOMIOKA R,SREBRO N.Towards understanding the role of over-parametrization in generalization of neural networks[C]//Proceedings of the 35th International Conference on Machine Learning.PMLR,2018:3784-3793.
[1] JIANG Gaoxia, WANG Fei, XU Hang, WANG Wenjian. Robust Estimation and Filtering Methods for Ordinal Label Noise [J]. Computer Science, 2024, 51(6): 144-152.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!