一种新的优化机制:Rain

doi:10.11896/jsjkx.201100032

Abstract

Abstract: The loss function of the traditional model in the field of machine learning is convex,so it has a global optimal solution.The optimal solution can be obtained through the traditional gradient descent algorithm (SGD).However,in the field of deep learning,due to the implicit expression of the model function and the interchangeability of neurons in the same layer,the loss function is a non-convex function.Traditional gradient descent algorithms cannot find the optimal solution,even the more advanced optimization algorithms such as SGDM,Adam,Adagrad,and RMSprop cannot escape the limitations of local optimal solutions.Although the convergence speed has been greatly improved,they still cannot meet the actual needs.A series of existing optimization algorithms are improved based on the defects or limitations of the previous optimization algorithms,and the optimization effect is slightly improved,but the performance of different data sets is inconsistent.This article proposes a new optimization mechanism Rain,which combines the dropout mechanism in deep neural networks and integrates it into the optimization algorithm to achieve.This mechanism is not an improved version of the original optimization algorithm.It is a third-party mechanism independent of all optimization algorithms,but it can be used in combination with all optimization algorithms to improve its adaptability to data sets.This mechanism aims to optimize the performance of the model on the training set.The generalization problem on the test set is not the focus of this mechanism.This article uses Deep Crossing and FM two models with five optimization algorithms to conduct experiments on the Frappe and MovieLens data sets respectively.The results show that the model with the Rain mechanism has a significant reduction in the loss function value on the training set,and the convergence speed is accelerated,but its performance on the test set is almost the same as the original model,that is,its generalization is poor.

Key words: Convergence speed, Deep learning, Dropout mechanism, Optimization algorithm, Rain mechanism

CLC Number:

TP391

LIU Hua-ling, PI Chang-peng, LIU Meng-yao, TANG Xin. New Optimization Mechanism:Rain[J].Computer Science, 2021, 48(11A): 63-70.

References

[1]MITCHELL,TOM M.Machine learning[M].McGraw-Hill,1997.
[2]SILVER D,HUANG A,MADDISON C.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[3]HUANG Y,LIU D Y,HUANG K,et al.Overview on deeplearning[J].CAAI Transactions on Intelligent Systems,2019,1(14):1-19.
[4]QIU Y P.Neural networks and deep learning[M].Beijing:China Machine Press,2020.
[5]DAUPHIN Y,PASCANU R,GULCEHRE C,et al.Identifying and attacking the saddle point problem in high-dimensional non-convex optimization[J].Advances in Neural Information Processing Systems,2014,27.
[6]LE Q V,NGIAM J,COATES A,et al.On optimization methods for deep learning[C]//International Conference on Machine Learning.2011.
[7]RUDER S.An overview of gradient descent optimization algorithms[Z].2016.
[8]YOUSOFF S N M,BAHARIN A,ABDULLAH A.A review on optimization algorithm for deep learning method in bioinformatics field[C]//2016 IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES).2017.
[9]FLETCHER R.Practical methods of optimization[J].Journal of the Operational Research Society,2013,33(7):675-676.
[10]NOCEDAL J.Updating quasi-newton matrices with limitedstorage[J].Mathematics of Computation,1980,35(151):773-782.
[11]GOYAL P,DOLLAR P,GIRSHICK R,et al.Accurate,largeminibatch sgd:Training imagenet in 1 hour[J].arXiv:1706.02677,2017.
[12]LOSHCHILOV I,HUTTER F.Sgdr:Stochastic gradient de-scent with warm restarts[C]//International Joint Conference on Artificial Intelligence.2016.
[13]DUCHI J,HAZAN E,SINGER Y.Adaptive subgradient methods for online learning and stochastic optimization[J].Journal of Machine Learning Research,2011,12(7).
[14]TIELEMAN T,HINTON G.Lecture 6.5-rmsprop:Divide the gradient by a running average of its recent magnitude[C]//COURSERA:Neural Networks for Machine Learning.2012.
[15]ZEILER D M.Adadelta:An adaptive learning rate method[C]//International Joint Conference on Artificial Intelligence.2012.
[16]QIAN N.On the momentum term in gradient descent learning algorithms[J].Neural Netw,1999,12(1):145-151.
[17]NESTEROV Y.Gradient methods for minimizing compositefunctions[J].Mathematical Programming,2013,140(1):125-161.
[18]SUTSKEVER I,MARTENS J,DAHL G,et al.On the importance of initialization and momentum in deep learning[C]//International Conference on Machine Learning.2013.
[19]PASCANU R,MIKOLOV T,BENGIO Y.On the difficulty oftraining recurrent neural networks[C]//Proceedings of the International Conference on Machine Learning.2013.
[20]KINGMA D,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[21]REDDI S J,KALE S,KUMAR S.On the convergence of adam and beyond[C]//Proceedings of the International Conference on Learning Representations.2019.
[22]LUO L,XIONG Y,LIU Y,et al.Adaptive gradient methodswith dynamic bound of learning rate[C]//Proceedings of the International Conference on Learning Representations.2019.
[23]ZHANG M R,LUCAS J,HINTON G,et al.Lookahead optimizer:k steps forward,1 step back[C]//Proceedings of the Neural Information Processing Systems.2019.
[24]LIU L,JIANG H,HE P,et al.On the variance of the adaptive learning rate and beyond[C]//Proceedings of the International Conference on Learning Representations.2019.
[25]RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986,323(6088):533-536.
[26]HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].Computerence,2012,3(4):212-223.
[27]RENDLE S.Factorization machines[C]//IEEE InternationalConference on Data Mining.2011.
[28]SHAN Y,HOENS T R,JIAO J,et al.Deep crossing:Web-scale modeling without manually crafted combinatorial features[C]//22nd ACMSIGKDD International Conference.2016.
[29]XIAO J,YE H,HE X,et al.Attentional factorization machines:Learning the weight of feature interactions via attention networks[C]//26^th International Joint Conference on Artificial Intelligence.2017.

Related Articles 15

[1]	RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2]	TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[3]	XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[4]	WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[5]	HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[6]	JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[7]	SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[8]	CHEN Jun, HE Qing, LI Shou-yu. Archimedes Optimization Algorithm Based on Adaptive Feedback Adjustment Factor [J]. Computer Science, 2022, 49(8): 237-246.
[9]	HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[10]	CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[11]	HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[12]	ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[13]	SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.
[14]	ZHU Wen-tao, LAN Xian-chao, LUO Huan-lin, YUE Bing, WANG Yang. Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN [J]. Computer Science, 2022, 49(6A): 378-383.
[15]	WANG Jian-ming, CHEN Xiang-yu, YANG Zi-zhong, SHI Chen-yang, ZHANG Yu-hang, QIAN Zheng-kun. Influence of Different Data Augmentation Methods on Model Recognition Accuracy [J]. Computer Science, 2022, 49(6A): 418-423.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

New Optimization Mechanism:Rain

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0