Computer Science ›› 2026, Vol. 53 ›› Issue (4): 33-39.doi: 10.11896/jsjkx.250600129
• Interdisciplinary Integration of Artificial Intelligence and Theoretical Computer Science • Previous Articles Next Articles
LI Pengqi, DING Lizhong, ZHANG Chunhui, FU Jiarun
CLC Number:
| [1]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(9):1992-2001. [2]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:5998-6008. [3]BROWN T B,MANN B,RYDER N,et al.Language models are few-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:18779-18797. [4]BELLEC P H,BOUSQUET O,GUEDJ B,et al.Reconcilingmodern machine learning practice and the bias-variance trade-off[C]//Proceedings of the 26th International Conference on Artificial Intelligence and Statistics(AISTATS).PMLR,2023:2358-2376. [5]CYBENKO G.Approximation by superpositions of a sigmoidal function[J].Mathematics of Control,Signals and Systems,1989,2(4):303-314. [6]ERINGIS D,HOFMANN T,RAKHLIN A.PAC-Bayes gene-ralisation bounds for dynamical systems including stable RNNs[J].Proceedings of the AAAI Conference on Artificial Intelligence,2024,38(11):11901-11909. [7]DZIUGAITE G K,ROY D M.Computing non-vacuous generalization bounds for deep(stochastic) neural networks with many more parameters than training data[J].arXiv:1703.11008,2017. [8]BARTLETT P L,FOSTER D J,TELGARSKY M J.Spectrally-normalized margin bounds for neural networks[C]//Advances in Neural Information Processing Systems.California:NIPS,2017:6241-6250. [9]BARTLETT P L,MENDELSON S.Rademacher and Gaussian complexities:risk bounds and structural results[J].Journal of Machine Learning Research,2002,3(11):463-482. [10]SACHS S,OBRIST R,SIMCHI-LEVI D,et al.Generalizationguarantees via algorithm-dependent Rademacher complexity[C]//Proceedings of the 36th Annual Conference on Learning Theory.Bangalore:COLT,2023:4863-4880. [11]DU S S,LEE J D,LI H,et al.Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks[C]//Proceedings of the 36th International Conference on Machine Learning.Long Beach,CA:PMLR,2019:1617-1626. [12]HARDT M,RECHT B,SINGER Y.Train faster,generalizebetter:stability of stochastic gradient descent[C]//Proceedings of the 33rd International Conference on Machine Learning.New York:PMLR,2016:1225-1234. [13]MOU W,WANG L,ZOU D,et al.Generalization bounds ofSGLD for non-convex learning:two theoretical viewpoints[J].arXiv:1707.05947,2017. [14]ATTIA A,KOREN T.Uniform stability for first-order empirical risk minimization[C]//Proceedings of the Thirty-Fifth Conference on Learning Theory.PMLR,2022:3313-3332. [15]JIANG Y,NASSER Y,RAGHAVAN D,et al.NeurIPS 2020 competition:predicting generalization in deep learning[C]//Proceedings of the NeurIPS 2020 Competition and Demonstration Track.Vancouver,BC:NeurIPS,2020:170-190. [16]ADVANI M S,SAXE A M.High-dimensional dynamics of gene-ralization error in neural networks[J].arXiv:1710.03667,2017. [17]NAGARAJAN V,KOLTER J Z.Uniform convergence may be unable to explain generalization in deep learning[C]//Procee-dings of the 33st International Conference on Neural Information Processing Systems.2019:11615-11626. [18]KAWAGUCHI K,ZHANG L,BENGIO Y,et al.How does information bottleneck help deep learning?[C]//International Conference on Machine Learning(ICML).ICML,2023. [19]ARORA S,GE R,NEGDY A,et al.Stronger generalizationbounds for deep nets via a compression approach[C]//Procee-dings of the 35th International Conference on Machine Lear-ning.PMLR,2018:3597-3606. [20]CHATTERJEE S,ZIELINSKI P.On the generalization mystery in deep learning[J].arXiv:2203.10036,2022. [21]DWORK C,ROTH A.The algorithmic foundations of differential privacy[J].Foundations and Trends in Theoretical Compu-ter Science,2014,9(3/4):211-407. [22]VASILEIOU A,JEGELKA S,LEVIE R,et al.Survey on gene-ralization theory for graph neural networks[J].arXiv:2503.15650,2025. [23]SUN T,LIN J.PAC-Bayesian adversarially robust generalization bounds for graph neural network[J].arXiv:2402.04038,2024. [24]NEYSHABUR B,TOMIOKA R,SREBRO N.In search of the real inductive bias:on the role of implicit regularization in deep learning[C]//Proceedings of the 3rd International Conference on Learning Representations(ICLR).ICLR,2015. [25]ZHANG C Y,BENGIO S,HARDT M,et al.Understandingdeep learning requires rethinking generalization[C]//Procee-dings of the 5th International Conference on Learning Representations(ICLR).ICLR,2017. [26]ZHOU W,SUN Q,POGGIO T,et al.Non-vacuous generalization bounds at the ImageNet scale:a PAC-Bayesian compression approach[C]//Proceedings of the 7th International Conference on Learning Representations(ICLR).New York:ICLR,2019:1-14. [27]NAGARAJAN V,KOLTER J Z.Generalization in deep net-works:the role of distance from initialization[J].arXiv:1901.01672,2019. [28]YUN C,BHOJANAPALLI S,RAWAT A S,et al.Are trans-formers universal approximators of sequence-to-sequence functions?[J].arXiv:1912.10077,2019. [29]CORTES C,VAPNIK V.Support-vector networks[J].Machine Learning,1995,20:273-297. [30]BISHOP C M,NASRABADI N M.Pattern recognition and machine learning[M].New York:Springer,2006. [31]HOSMER D W,LEMESHOW S,STURDIVANT R X.Applied logistic regression[M].John Wiley & Sons,2013. [32]REN R F,LIU Y.Towards understanding how transformerslearn in-context through a representation learning lens[J].Advances in Neural Information Processing Systems,2025,37:892-933. [33]JACOT A,GABRIEL F,HONGLER C.Neural tangent kernel:convergence and generalization in neural networks[C]//Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing.2018. [34]TAN L,WU S,ZHOU W,et al.Weighted neural tangent kernel:a generalized and improved network-induced kernel[J].Machine Learning,2023,112(8):2871-2901. [35]JI Z,TELGARSKY M.Polylogarithmic width suffices for gra-dient descent to achieve arbitrarily small test error with shallow ReLU networks[J].arXiv:1909.12292,2019. [36]VERSHYNIN R.High-dimensional probability:an introduction with applications in data science[M].Cambridge:Cambridge University Press,2018. [37]ZHANG X H,LI M,WANG J,et al.A joint training framework for learning under label noise[J].Journal of Computer Research and Development,2022,59(10):2021-2035. [38]NEYSHABUR B,BHATIA K,BARTLETT P L,et al.A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks[J].arXiv:1707.09564,2017. [39]NEYSHABUR B,TOMIOKA R,SREBRO N.Towards understanding the role of over-parametrization in generalization of neural networks[C]//Proceedings of the 35th International Conference on Machine Learning.PMLR,2018:3784-3793. |
| [1] | JIANG Gaoxia, WANG Fei, XU Hang, WANG Wenjian. Robust Estimation and Filtering Methods for Ordinal Label Noise [J]. Computer Science, 2024, 51(6): 144-152. |
|
||