深度神经网络训练中适用于小批次的归一化算法

计算机科学 ›› 2019, Vol. 46 ›› Issue (11A): 273-276.

深度神经网络训练中适用于小批次的归一化算法

王岩, 吴晓富

(南京邮电大学通信与信息工程学院南京210003)

出版日期:2019-11-10 发布日期:2019-11-20
通讯作者: 吴晓富(1975-),男,博士,主要研究方向为机器学习(人工智能信号处理)与计算机视觉,E-mail:xfuwu@njupt.edu。
作者简介:王岩(1995-),女,硕士,主要研究方向为图像分类,E-mail:hefeiwangyande@126.com。
基金资助:
本文受国家自然科学基金项目(61372123,61401228,61671253),南京邮电大学科学研究基金项目(NY213002)资助。

Novel Normalization Algorithm for Training of Deep Neural Networks with Small Batch Sizes

WANG Yan, WU Xiao-fu

(School of Telecommunication and Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)

Online:2019-11-10 Published:2019-11-20

摘要/Abstract

摘要： 近年来,批归一化(Batch Normalization,BN)算法已成为深度网络训练不可或缺的一部分。BN通过计算批次中示例的均值和方差来对输入进行归一化,从而缓解深度神经网络训练中的梯度爆炸或者消失的问题。但是,由于算法与批次大小有关,BN算法用于小批次时会因为不准确的估计导致性能下降。批重归一化(Batch ReNormalization,BRN)用指数移动平均(Exponential Moving Average,EMA)后的值对输入进行归一化操作,减小了归一化算法对批次的依赖。本文基于图像分类任务研究了在输入是小批次时归一化技术的应用,提出了通过改变EMA初值并对估计值加以修正来得到更准确的参数估计的批归一化算法。实验结果表明,所提算法与标准的BN和BRN算法相比,收敛速度更快,准确率有一定的改善。

关键词: 归一化算法, 图像分类, 小批次, 指数移动平均

Abstract: Batch Normalization (BN) algorithm has become a key ingredient of the standard toolkit for training deep neural networks.BN normalizes the input with the mean and variance computed over batches to mitigate the possible gradient explosion or disappearance during training of deep neural networks.However,the performance of BN algorithm often degrades when it is applied to small batch sizes due to inaccurate estimates of mean and variance.Batch ReNormalization (BRN) normalizes the input with the values of exponentialmoving average (EMA),reducing the dependency of the normalization algorithm on batches.This paper proposed a novel normalization algorithm with improved estimate on the moving mean and varianceby changing the initial value of EMA and adding corrections to the estimates.The experimental results show that the proposed algorithm has better performance in convergence speed and accuracy than both the standard BN and BRN algorithms.

Key words: Exponential moving average, Image classification, Normalization algorithm, Small batches

中图分类号:

TP183

王岩, 吴晓富. 深度神经网络训练中适用于小批次的归一化算法[J]. 计算机科学, 2019, 46(11A): 273-276. https://doi.org/

WANG Yan, WU Xiao-fu. Novel Normalization Algorithm for Training of Deep Neural Networks with Small Batch Sizes[J]. Computer Science, 2019, 46(11A): 273-276. https://doi.org/

参考文献

[1]LAWRENCE S,GILES C L,TSOI A C,et al.Face recognition:A convolutional neural-network approach[J].IEEE transactions on neural networks,1997,8(1):98-113.
[2]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet-classification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2012:1097-1105.
[3]LECUN Y,BENGIO Y.Convolutional networks for images,speech,and time series[M]∥The Handbook of Brain Theory and Neural Networks.MIT Press,1998.
[4]ABDEL-HAMID O,DENG L,YU D.Exploring convolutionalneural network structures and optimization techniques for speech recognition[C]∥INTERSPEECH 2013.Lyon,2013.
[5]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2012:1097-1105.
[6]IOFFE S,SZEGEDY C.Batch normalization:accelerating deep network training by reducing internal covariate shift[C]∥International Conference on International Conference on Machine Learning.JMLR.org,2015.
[7]IOFFE S.Batch renormalization:Towards reducing minibatchdependence in batch-normalized models[C]∥Advances in Neural Information Processing Systems.2017:1945-1953.
[8]BA J L,KIROS J R,HINTON G E.Layer normalization[J].arXiv:1607.06450,2016.
[9]WU Y,HE K.Group normalization[C]∥Proceedings of the European Conference on Computer Vision (ECCV).2018:3-19.
[10]SALIMANS T,KINGMA D P.Weight normalization:A simple reparameterization to accelerate training of deep neural networks[C]∥Advances in Neural Information Processing Systems.2016:901-909.
[11]REN M,LIAO R,URTASUN R,et al.Normalizing the normalizers:Comparing and extending network normalization schemes[C]∥ICLR.2017.
[12]LIAO Q,KAWAGUCHI K,POGGIO T.Streaming Normalization:Towards Simpler and More Biologicallyplausible Normalizations for Online and Recurrent Learning[J].arXiv:1610.06160v1,2016.
[13]SPRINGENBERG J T,DOSOVITSKIY A,BROX T,et al.Striving for simplicity:The all convolutional net[C]∥ICLR.2015.
[14]LIN M,CHEN Q,YAN S.Network in network[J].arXiv:1312.4400,2013.
[15]KRIZHEVSKY A,HINTON G.Learning multiple layers of features from tiny images:Technical Report:TR-2009[R].University of Toronto,2009.
[16]CLEVERT D A,UNTERTHINER T,HOCHREITER S.Fast and accurate deep network learning by exponential linear units (elus)[C]∥ICLR.2016.

相关文章 15

[1]	武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[2]	杨健楠, 张帆. 一种结合双注意力机制和层次网络结构的细碎农作物分类方法 Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure 计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169
[3]	杜丽君, 唐玺璐, 周娇, 陈玉兰, 程建. 基于注意力机制和多任务学习的阿尔茨海默症分类 Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning 计算机科学, 2022, 49(6A): 60-65. https://doi.org/10.11896/jsjkx.201200072
[4]	朱旭东, 熊贇. 基于样本分布损失的图像多标签分类研究 Study on Multi-label Image Classification Based on Sample Distribution Loss 计算机科学, 2022, 49(6): 210-216. https://doi.org/10.11896/jsjkx.210300267
[5]	彭云聪, 秦小林, 张力戈, 顾勇翔. 面向图像分类的小样本学习算法综述 Survey on Few-shot Learning Algorithms for Image Classification 计算机科学, 2022, 49(5): 1-9. https://doi.org/10.11896/jsjkx.210500128
[6]	张文轩, 吴秦. 基于多分支注意力增强的细粒度图像分类 Fine-grained Image Classification Based on Multi-branch Attention-augmentation 计算机科学, 2022, 49(5): 105-112. https://doi.org/10.11896/jsjkx.210100108
[7]	许华杰, 陈育, 杨洋, 秦远卓. 基于混合样本自动数据增强技术的半监督学习方法 Semi-supervised Learning Method Based on Automated Mixed Sample Data Augmentation Techniques 计算机科学, 2022, 49(3): 288-293. https://doi.org/10.11896/jsjkx.210100156
[8]	董琳, 黄丽清, 叶锋, 黄添强, 翁彬, 徐超. 人脸伪造检测泛化性方法综述 Survey on Generalization Methods of Face Forgery Detection 计算机科学, 2022, 49(2): 12-30. https://doi.org/10.11896/jsjkx.210900146
[9]	陈天荣, 凌捷. 基于特征映射的差分隐私保护机器学习方法 Differential Privacy Protection Machine Learning Method Based on Features Mapping 计算机科学, 2021, 48(7): 33-39. https://doi.org/10.11896/jsjkx.201200224
[10]	胡京徽, 许鹏. 一种基于图像分类的航空紧固件产品自动分类方法 Automatic Classification of Aviation Fastener Products Based on Image Classification 计算机科学, 2021, 48(6A): 63-66. https://doi.org/10.11896/jsjkx.200900163
[11]	刘汉卿, 康晓东, 李博, 张华丽, 冯继超, 韩俊玲. 利用深度学习网络对医学影像分类识别的比较研究 Comparative Study on Classification and Recognition of Medical Images Using Deep Learning Network 计算机科学, 2021, 48(6A): 89-94. https://doi.org/10.11896/jsjkx.201000116
[12]	魏冬, 刘浩, 陈根龙, 宫晓蕙. 基于颜色校正和去模糊的水下图像增强方法 Underwater Image Enhancement Based on Color Correction and Deblurring 计算机科学, 2021, 48(4): 144-150. https://doi.org/10.11896/jsjkx.200800185
[13]	谢海平, 李高源, 杨海涛, 赵洪利. 超分辨率重构遥感图像分类研究 Classification Research of Remote Sensing Image Based on Super Resolution Reconstruction 计算机科学, 2021, 48(11A): 424-428. https://doi.org/10.11896/jsjkx.210300132
[14]	吴昊昊, 王方石. 多尺度膨胀卷积在图像分类中的应用 Application of Multi-scale Dilated Convolution in Image Classification 计算机科学, 2020, 47(6A): 166-171. https://doi.org/10.11896/JsJkx.190600179
[15]	张华丽, 康晓东, 冉华, 王亚鸽, 李博, 白放. 用于肺结节影像分类识别的DBN与CNN的比较研究 Comparative Study of DBN and CNN for Pulmonary Nodule Image Recognition 计算机科学, 2020, 47(6A): 254-259. https://doi.org/10.11896/JsJkx.190700107

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed