Computer Science ›› 2019, Vol. 46 ›› Issue (11A): 273-276.

• Pattern Recognition & Image Processing • Previous Articles     Next Articles

Novel Normalization Algorithm for Training of Deep Neural Networks with Small Batch Sizes

WANG Yan, WU Xiao-fu   

  1. (School of Telecommunication and Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
  • Online:2019-11-10 Published:2019-11-20

Abstract: Batch Normalization (BN) algorithm has become a key ingredient of the standard toolkit for training deep neural networks.BN normalizes the input with the mean and variance computed over batches to mitigate the possible gradient explosion or disappearance during training of deep neural networks.However,the performance of BN algorithm often degrades when it is applied to small batch sizes due to inaccurate estimates of mean and variance.Batch ReNormalization (BRN) normalizes the input with the values of exponentialmoving average (EMA),reducing the dependency of the normalization algorithm on batches.This paper proposed a novel normalization algorithm with improved estimate on the moving mean and varianceby changing the initial value of EMA and adding corrections to the estimates.The experimental results show that the proposed algorithm has better performance in convergence speed and accuracy than both the standard BN and BRN algorithms.

Key words: Exponential moving average, Image classification, Normalization algorithm, Small batches

CLC Number: 

  • TP183
[1]LAWRENCE S,GILES C L,TSOI A C,et al.Face recognition:A convolutional neural-network approach[J].IEEE transactions on neural networks,1997,8(1):98-113.
[2]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet-classification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2012:1097-1105.
[3]LECUN Y,BENGIO Y.Convolutional networks for images,speech,and time series[M]∥The Handbook of Brain Theory and Neural Networks.MIT Press,1998.
[4]ABDEL-HAMID O,DENG L,YU D.Exploring convolutionalneural network structures and optimization techniques for speech recognition[C]∥INTERSPEECH 2013.Lyon,2013.
[5]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2012:1097-1105.
[6]IOFFE S,SZEGEDY C.Batch normalization:accelerating deep network training by reducing internal covariate shift[C]∥International Conference on International Conference on Machine Learning.JMLR.org,2015.
[7]IOFFE S.Batch renormalization:Towards reducing minibatchdependence in batch-normalized models[C]∥Advances in Neural Information Processing Systems.2017:1945-1953.
[8]BA J L,KIROS J R,HINTON G E.Layer normalization[J].arXiv:1607.06450,2016.
[9]WU Y,HE K.Group normalization[C]∥Proceedings of the European Conference on Computer Vision (ECCV).2018:3-19.
[10]SALIMANS T,KINGMA D P.Weight normalization:A simple reparameterization to accelerate training of deep neural networks[C]∥Advances in Neural Information Processing Systems.2016:901-909.
[11]REN M,LIAO R,URTASUN R,et al.Normalizing the normalizers:Comparing and extending network normalization schemes[C]∥ICLR.2017.
[12]LIAO Q,KAWAGUCHI K,POGGIO T.Streaming Normalization:Towards Simpler and More Biologicallyplausible Normalizations for Online and Recurrent Learning[J].arXiv:1610.06160v1,2016.
[13]SPRINGENBERG J T,DOSOVITSKIY A,BROX T,et al.Striving for simplicity:The all convolutional net[C]∥ICLR.2015.
[14]LIN M,CHEN Q,YAN S.Network in network[J].arXiv:1312.4400,2013.
[15]KRIZHEVSKY A,HINTON G.Learning multiple layers of features from tiny images:Technical Report:TR-2009[R].University of Toronto,2009.
[16]CLEVERT D A,UNTERTHINER T,HOCHREITER S.Fast and accurate deep network learning by exponential linear units (elus)[C]∥ICLR.2016.
[1] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[2] DU Li-jun, TANG Xi-lu, ZHOU Jiao, CHEN Yu-lan, CHENG Jian. Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning [J]. Computer Science, 2022, 49(6A): 60-65.
[3] YANG Jian-nan, ZHANG Fan. Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure [J]. Computer Science, 2022, 49(6A): 353-357.
[4] ZHU Xu-dong, XIONG Yun. Study on Multi-label Image Classification Based on Sample Distribution Loss [J]. Computer Science, 2022, 49(6): 210-216.
[5] PENG Yun-cong, QIN Xiao-lin, ZHANG Li-ge, GU Yong-xiang. Survey on Few-shot Learning Algorithms for Image Classification [J]. Computer Science, 2022, 49(5): 1-9.
[6] ZHANG Wen-xuan, WU Qin. Fine-grained Image Classification Based on Multi-branch Attention-augmentation [J]. Computer Science, 2022, 49(5): 105-112.
[7] XU Hua-jie, CHEN Yu, YANG Yang, QIN Yuan-zhuo. Semi-supervised Learning Method Based on Automated Mixed Sample Data Augmentation Techniques [J]. Computer Science, 2022, 49(3): 288-293.
[8] DONG Lin, HUANG Li-qing, YE Feng, HUANG Tian-qiang, WENG Bin, XU Chao. Survey on Generalization Methods of Face Forgery Detection [J]. Computer Science, 2022, 49(2): 12-30.
[9] CHEN Tian-rong, LING Jie. Differential Privacy Protection Machine Learning Method Based on Features Mapping [J]. Computer Science, 2021, 48(7): 33-39.
[10] HU Jing-hui, XU Peng. Automatic Classification of Aviation Fastener Products Based on Image Classification [J]. Computer Science, 2021, 48(6A): 63-66.
[11] WEI Dong, LIU Hao, CHEN Gen-long, GONG Xiao-hui. Underwater Image Enhancement Based on Color Correction and Deblurring [J]. Computer Science, 2021, 48(4): 144-150.
[12] CHEN Jing-bang, PAN Jun-zhe, SHEN Hao-lang, GU Pei andHU Ming-tao. Portfolio Optimization System Based on Multiple Trend Indices with Time Picking of Inducing Peak Prices [J]. Computer Science, 2021, 48(11A): 693-698.
[13] XIE Hai-ping, LI Gao-yuan, YANG Hai-tao, ZHAO Hong-li. Classification Research of Remote Sensing Image Based on Super Resolution Reconstruction [J]. Computer Science, 2021, 48(11A): 424-428.
[14] WU Hao-hao and WANG Fang-shi. Application of Multi-scale Dilated Convolution in Image Classification [J]. Computer Science, 2020, 47(6A): 166-171.
[15] ZHANG Hua-li, KANG Xiao-dong, RAN Hua, WANG Ya-ge, LI Bo and BAI Fang. Comparative Study of DBN and CNN for Pulmonary Nodule Image Recognition [J]. Computer Science, 2020, 47(6A): 254-259.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!