基于近似牛顿法的分布式卷积神经网络训练

doi:10.11896/j.issn.1002-137X.2019.07.028

计算机科学 ›› 2019, Vol. 46 ›› Issue (7): 180-185.doi: 10.11896/j.issn.1002-137X.2019.07.028

基于近似牛顿法的分布式卷积神经网络训练

王雅慧^1,2,刘博³,袁晓彤^1,2

(南京信息工程大学信息与控制学院南京210044)¹
(江苏省大数据分析技术重点实验室南京210044)²
(罗格斯大学计算机科学学院新泽西州 08854)³

收稿日期:2018-08-02 出版日期:2019-07-15 发布日期:2019-07-15
作者简介:王雅慧(1993-),女,硕士生,主要研究方向为机器学习与分布式深度学习,E-mail:20162283677@nuist.edu.cn;刘博(1984-),男,博士生,主要研究方向为机器学习与计算机视觉;袁晓彤(1980-),男,博士后,教授,CCF会员,主要研究方向为机器学习与计算机视觉,E-mail:xtyan1980@qmail.com(通信作者)。
基金资助:
国家自然科学基金(61876090,61522308)资助

Distributed Convolutional Neural Networks Based on Approximate Newton-type Mothod

WANG Ya-hui^1,2,LIU Bo³,YUAN Xiao-tong^1,2

(Department of Information and Control,Nanjing University of Information Science and Technology,Nanjing 210044,China)¹
(Jiangsu Key Laboratory of Big Data Analysis Technology,Nanjing 210044,China)²
(Department of Computer Science,Rutgers University,New Jersey 08854,USA)³

Received:2018-08-02 Online:2019-07-15 Published:2019-07-15

摘要/Abstract

摘要： 大多数机器学习问题可以最终归结为最优化问题(模型学习)。它主要运用数学方法研究各种问题的优化途径及方案,在科学计算和工程分析中起着越来越重要的作用。随着深度网络的快速发展,数据和参数规模也日益增长。尽管近些年来GPU硬件、网络架构和训练方法均取得了重大的进步,但单一计算机仍然很难在大型数据集上高效地训练深度网络模型,分布式近似牛顿法作为解决这一问题的有效方法之一被引入到分布式神经网络的研究中。分布式近似牛顿法将总体样本平均分布到多台计算机,减少了每台计算机所需处理的数据量,使计算机之间互相通信,共同协作完成训练任务。文中提出了基于近似牛顿法的分布式深度学习,在相同的网络中利用分布式近似牛顿法训练,随着GPU数目呈2的指数次幂增加,训练时间呈近乎2的指数次幂减少。这与研究的最终目的一致,即在保证估计精度的前提下,利用现有分布式框架实现近似牛顿法,分布式训练神经网络,从而提升训练效率。

关键词: 分布式框架, 近似牛顿法, 神经网络, 最优化问题

Abstract: Most machine learning problems can ultimately be attributed to optimization problems (model learning).It mainly uses mathematics methods to study the optimal ways and solutions for various problems and plays an increasingly important role in scientific computing and engineering analysis.With the rapid development of deep networks,the scale of data and parameters also increases.Although significant advances have been made in GPU hardware,network architecture and training methods in recent years,it is still difficult for a single computer to efficiently train deep network models on large data sets.The distributed approximation Newton-type method is one of the effective methods to solve this problem.It is introduced into the study of distributed neural networks.Distributed approximation Newton-type method distributes the average sample evenly across multiple computers,the amount of data to be processed by each computer is reduced,and computers communicate with each other to complete the training task.This paper proposed distributed deep learning based on Approximation Newton-type method.The DANE algorithm is used to train in the same network.As the number of GPUs increasesexponentially by 2,the training time decreases exponentially by nearly 2.This is consistent with ultimate goal,that is,on the premise of ensuring the estimation accuracy,the existing distributed framework is used to implement the approximate Newton-like algorithm,and the algorithm is used to train the neural network in a distributed manner to improve the operating efficiency.

Key words: Approximate Newton-typemethod, Distributed framework, Neural network, Optimization problem

中图分类号:

TP181

王雅慧, 刘博, 袁晓彤. 基于近似牛顿法的分布式卷积神经网络训练[J]. 计算机科学, 2019, 46(7): 180-185. https://doi.org/10.11896/j.issn.1002-137X.2019.07.028

WANG Ya-hui, LIU Bo, YUAN Xiao-tong. Distributed Convolutional Neural Networks Based on Approximate Newton-type Mothod[J]. Computer Science, 2019, 46(7): 180-185. https://doi.org/10.11896/j.issn.1002-137X.2019.07.028

参考文献

[1]GANDHI A,THOTA S,DUBE P,et al.Autoscaling for Hadoop Clusters[C]∥IEEE International Conference on Cloud Engineering.IEEE,2016:109-118.
[2]YUAN Y,SALMI M F,YIN H,et al.Spark-GPU:An accele- rated in-memory data processing engine on clusters[C]∥IEEE International Conference on Big Data.IEEE,2017:273-283.
[3]SAMADDAR S,SINHA R,DE R K.A MODEL for DISTRIBUTED PROCESSING and ANALYSES of NGS DATA under MAP-REDUCE PARADIGM[J].IEEE/ACM Transactions on Computational Biology & Bioinformatics,2018,PP(99):1.
[4]NASR M M,SHAABAN E M,HAFEZ A M.Building Sentiment analysis Model using Graphlab[J].International Journal of Scientific & Engineering Research,2017,8(6):1155-1160.
[5]JIANG J,CUI B,ZHANG C,et al.Heterogeneity-aware Distri- buted Parameter Servers[C]∥ACM International Conference.ACM,2017:463-478.
[6]CHEN T,LI M,LI Y,et al.Mxnet:A flexible and efficient machine learning library for heterogeneous distributed systems[J].arXiv preprint arXiv:1512.01274,2015.
[7]ZINKEVICH M,WEIMER M,SMOLA A J,et al.ParallelizedStochastic Gradient Descent[C]∥Advances in Neural Information Processing Systems 23,Conference on Neural Information Processing Systems 2010.DBLP,2010:2595-2603.
[8]ZHANG Y,DUCHI J C,WAINWRIGHT M J.Communication-efficient algorithms for statistical optimization[C]∥Internatio-nal Conference on Neural Information Processing Systems.Curran Associates Inc.,2012:1502-1510.
[9]GUPTA S,ZHANG W,WANG F.Model Accuracy and Run- time Tradeoff in Distributed Deep Learning:A Systematic Study[C]∥IEEE,International Conference on Data Mining.IEEE,2017:171-180.
[10]SHALEV-SHWARTZ S,SHAMIR O,SREBRO N,et al.Sto- chastic convex optimization[C]∥Annual Conference on Learning Theory.2009.
[11]SRIDHARAN K,SHALEV-SHWARTZ S,SREBRO N.Fast rates for regularized objectives[C]∥Advances in Neural Information Processing Systems.2009:1545-1552.
[12]NAJAFABADI M M,KHOSHGOFTAAR T M,VILLANUS- TRE F,et al.Large-scale distributed L-BFGS[J].Journal of Big Data,2017,4(1):22.
[13]ERSEGHE T.Distributed Optimal Power Flow Using ADMM[J].IEEE Transactions on Power Systems,2014,29(5):2370-2380.
[14]TAYLOR G,BURMEISTER R,XU Z,et al.Training neural networks without gradients:a scalable ADMM approach[C]∥International Conference on International Conference on Machine Learning.JMLR.org,2016:2722-2731.
[15]WANG Y,YIN W,ZENG J.Global convergence of ADMM in nonconvex nonsmooth optimization[J].Journal of Scientific Computing,2015(1-2):1-35.
[16]FENG X,CHANG L,LIN X,et al.Distributed computing connected components with linear communication cost[J].Distributed and Parallel Databases,2018,36(3):555-592.
[17]SHAMIRO,SREBRO N,ZHANG T.Communication-efficient distributed optimization using an approximate Newton-type method[C]∥International Confe-rence on International Confe-rence on Machine Learning.JMLR.org,2014:II-1000.
[18]ZHANG Y,WAINWRIGHT M J,DUCHI J C.Communication-efficient algorithms for statistical optimization[C]∥Advances in Neural Information Processing Systems.2012:1502-1510.
[19]LI M.Scaling Distributed Machine Learning with the Parameter Server[C]∥International Conference on Big Data Science and Computing.ACM,2014:3.
[20]CHAUDHARI P,BALDASSI C,ZECCHINA R,et al.Parle: parallelizing stochastic gradient descent[J].arXiv preprint ar-Xiv:1707.00424,2017.

相关文章 15

[1]	宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[2]	周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[3]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[4]	李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[5]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6]	王润安, 邹兆年. 基于物理操作级模型的查询执行时间预测方法 Query Performance Prediction Based on Physical Operation-level Models 计算机科学, 2022, 49(8): 49-55. https://doi.org/10.11896/jsjkx.210700074
[7]	陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[8]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[9]	檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[10]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[11]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[12]	彭双, 伍江江, 陈浩, 杜春, 李军. 基于注意力神经网络的对地观测卫星星上自主任务规划方法 Satellite Onboard Observation Task Planning Based on Attention Neural Network 计算机科学, 2022, 49(7): 242-247. https://doi.org/10.11896/jsjkx.210500093
[13]	费星瑞, 谢逸. 基于HMM-NN的用户点击流识别 Click Streams Recognition for Web Users Based on HMM-NN 计算机科学, 2022, 49(7): 340-349. https://doi.org/10.11896/jsjkx.210600127
[14]	赵冬梅, 吴亚星, 张红斌. 基于IPSO-BiLSTM的网络安全态势预测 Network Security Situation Prediction Based on IPSO-BiLSTM 计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103
[15]	齐秀秀, 王佳昊, 李文雄, 周帆. 基于概率元学习的矩阵补全预测融合算法 Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning 计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于近似牛顿法的分布式卷积神经网络训练

Distributed Convolutional Neural Networks Based on Approximate Newton-type Mothod

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0