计算机科学 ›› 2019, Vol. 46 ›› Issue (7): 180-185.doi: 10.11896/j.issn.1002-137X.2019.07.028
王雅慧1,2,刘博3,袁晓彤1,2
WANG Ya-hui1,2,LIU Bo3,YUAN Xiao-tong1,2
摘要: 大多数机器学习问题可以最终归结为最优化问题(模型学习)。它主要运用数学方法研究各种问题的优化途径及方案,在科学计算和工程分析中起着越来越重要的作用。随着深度网络的快速发展,数据和参数规模也日益增长。尽管近些年来GPU硬件、网络架构和训练方法均取得了重大的进步,但单一计算机仍然很难在大型数据集上高效地训练深度网络模型,分布式近似牛顿法作为解决这一问题的有效方法之一被引入到分布式神经网络的研究中。分布式近似牛顿法将总体样本平均分布到多台计算机,减少了每台计算机所需处理的数据量,使计算机之间互相通信,共同协作完成训练任务。文中提出了基于近似牛顿法的分布式深度学习,在相同的网络中利用分布式近似牛顿法训练,随着GPU数目呈2的指数次幂增加,训练时间呈近乎2的指数次幂减少。这与研究的最终目的一致,即在保证估计精度的前提下,利用现有分布式框架实现近似牛顿法,分布式训练神经网络,从而提升训练效率。
中图分类号:
[1]GANDHI A,THOTA S,DUBE P,et al.Autoscaling for Hadoop Clusters[C]∥IEEE International Conference on Cloud Engineering.IEEE,2016:109-118. [2]YUAN Y,SALMI M F,YIN H,et al.Spark-GPU:An accele- rated in-memory data processing engine on clusters[C]∥IEEE International Conference on Big Data.IEEE,2017:273-283. [3]SAMADDAR S,SINHA R,DE R K.A MODEL for DISTRIBUTED PROCESSING and ANALYSES of NGS DATA under MAP-REDUCE PARADIGM[J].IEEE/ACM Transactions on Computational Biology & Bioinformatics,2018,PP(99):1. [4]NASR M M,SHAABAN E M,HAFEZ A M.Building Sentiment analysis Model using Graphlab[J].International Journal of Scientific & Engineering Research,2017,8(6):1155-1160. [5]JIANG J,CUI B,ZHANG C,et al.Heterogeneity-aware Distri- buted Parameter Servers[C]∥ACM International Conference.ACM,2017:463-478. [6]CHEN T,LI M,LI Y,et al.Mxnet:A flexible and efficient machine learning library for heterogeneous distributed systems[J].arXiv preprint arXiv:1512.01274,2015. [7]ZINKEVICH M,WEIMER M,SMOLA A J,et al.ParallelizedStochastic Gradient Descent[C]∥Advances in Neural Information Processing Systems 23,Conference on Neural Information Processing Systems 2010.DBLP,2010:2595-2603. [8]ZHANG Y,DUCHI J C,WAINWRIGHT M J.Communication-efficient algorithms for statistical optimization[C]∥Internatio-nal Conference on Neural Information Processing Systems.Curran Associates Inc.,2012:1502-1510. [9]GUPTA S,ZHANG W,WANG F.Model Accuracy and Run- time Tradeoff in Distributed Deep Learning:A Systematic Study[C]∥IEEE,International Conference on Data Mining.IEEE,2017:171-180. [10]SHALEV-SHWARTZ S,SHAMIR O,SREBRO N,et al.Sto- chastic convex optimization[C]∥Annual Conference on Learning Theory.2009. [11]SRIDHARAN K,SHALEV-SHWARTZ S,SREBRO N.Fast rates for regularized objectives[C]∥Advances in Neural Information Processing Systems.2009:1545-1552. [12]NAJAFABADI M M,KHOSHGOFTAAR T M,VILLANUS- TRE F,et al.Large-scale distributed L-BFGS[J].Journal of Big Data,2017,4(1):22. [13]ERSEGHE T.Distributed Optimal Power Flow Using ADMM[J].IEEE Transactions on Power Systems,2014,29(5):2370-2380. [14]TAYLOR G,BURMEISTER R,XU Z,et al.Training neural networks without gradients:a scalable ADMM approach[C]∥International Conference on International Conference on Machine Learning.JMLR.org,2016:2722-2731. [15]WANG Y,YIN W,ZENG J.Global convergence of ADMM in nonconvex nonsmooth optimization[J].Journal of Scientific Computing,2015(1-2):1-35. [16]FENG X,CHANG L,LIN X,et al.Distributed computing connected components with linear communication cost[J].Distributed and Parallel Databases,2018,36(3):555-592. [17]SHAMIRO,SREBRO N,ZHANG T.Communication-efficient distributed optimization using an approximate Newton-type method[C]∥International Confe-rence on International Confe-rence on Machine Learning.JMLR.org,2014:II-1000. [18]ZHANG Y,WAINWRIGHT M J,DUCHI J C.Communication-efficient algorithms for statistical optimization[C]∥Advances in Neural Information Processing Systems.2012:1502-1510. [19]LI M.Scaling Distributed Machine Learning with the Parameter Server[C]∥International Conference on Big Data Science and Computing.ACM,2014:3. [20]CHAUDHARI P,BALDASSI C,ZECCHINA R,et al.Parle: parallelizing stochastic gradient descent[J].arXiv preprint ar-Xiv:1707.00424,2017. |
[1] | 宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053 |
[2] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[3] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[4] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[5] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[6] | 王润安, 邹兆年. 基于物理操作级模型的查询执行时间预测方法 Query Performance Prediction Based on Physical Operation-level Models 计算机科学, 2022, 49(8): 49-55. https://doi.org/10.11896/jsjkx.210700074 |
[7] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[8] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[9] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[10] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[11] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[12] | 彭双, 伍江江, 陈浩, 杜春, 李军. 基于注意力神经网络的对地观测卫星星上自主任务规划方法 Satellite Onboard Observation Task Planning Based on Attention Neural Network 计算机科学, 2022, 49(7): 242-247. https://doi.org/10.11896/jsjkx.210500093 |
[13] | 费星瑞, 谢逸. 基于HMM-NN的用户点击流识别 Click Streams Recognition for Web Users Based on HMM-NN 计算机科学, 2022, 49(7): 340-349. https://doi.org/10.11896/jsjkx.210600127 |
[14] | 赵冬梅, 吴亚星, 张红斌. 基于IPSO-BiLSTM的网络安全态势预测 Network Security Situation Prediction Based on IPSO-BiLSTM 计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103 |
[15] | 齐秀秀, 王佳昊, 李文雄, 周帆. 基于概率元学习的矩阵补全预测融合算法 Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning 计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126 |
|