Computer Science ›› 2019, Vol. 46 ›› Issue (3): 9-18.doi: 10.11896/j.issn.1002-137X.2019.03.002

• Surveys • Previous Articles     Next Articles

Survey of Distributed Machine Learning Platforms and Algorithms

SHU Na1,LIU Bo1,LIN Wei-wei2,LI Peng-fei1   

  1. (School of Computer,South China Normal University,Guangzhou 510631,China)1
    (School of Computer Science and Technology,South China University of Technology,Guangzhou 510640,China)2
  • Received:2018-04-06 Revised:2018-06-28 Online:2019-03-15 Published:2019-03-22

Abstract: Distributed machine learning deploys many tasks which have large-scale data and computation in multiple machines.For improving the speed of largek-scale calculation and less overhead effectively,its core idea is “divide and conquer”.As one of the most important fields of machine learning,distributed machine learning has been widely concerned by researchers in each field.In view of research significance and practical value of distributed machine learning,this paper gave a summarization of mainstream platforms like Spark,MXNet,Petuum,TensorFlow and PyTorch,and analyzed their characteristics from different sides.Then,this paper made a deep explain for the implementation of machine learning algorithm from data parallel and model parallel,and gave a view of distributed computing model from bulk synchronous parallel model,asynchronous parallel model and delayed asynchronous parallel model.Finally,this paper discussed the future work of distributed machine learning from five aspects:improvement of platform,algorithms optimization,communication of networks,scalability of large-scale data algorithms and fault-tolerance.

Key words: Algorithm analysis, Big data, Distributed machine learning, Machine learning, Parallel computing

CLC Number: 

  • TP301
[1]PRESS G.A very short history of big data[EB/OL].https://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/#3cf546e65a18.
[2]XING E P,HO Q,XIE P,et al.Strategies and principles of distributed machine learning on big data[J].Engineering,2016,2(2):179-195.
[3]HE Q,LI N,LUO W J,et al.A survey of machine learning algo-
rithms for big data[J].Pattern Recognition and Artificial Intelligence,2014,27(4):327-336.(in Chinese)
何清,李宁,罗文娟,等.大数据下的机器学习算法综述[J].模式识别与人工智能,2014,27(4):327-336.
[4]ZHANG K,ALQAHTANI S,DEMIRBAS M.A Comparison of Distributed Machine Learning Platforms[C]∥2017 26th International Conference on Computer Communication and Networks (ICCCN).IEEE,2017:1-9.
[5]LIU B,HE J R,GENG Y J,et al.Recent advances in infrastructure architecture of parallel machine learning algorithms[J].Computer Engineering and Applications,2017,53(11):31-38.(in Chinese)
刘斌,何进荣,耿耀君,等.并行机器学习算法基础体系前沿进展综述[J].计算机工程与应用,2017,53(11):31-38.
[6]WANG Z,LIAO J,CAO Q,et al.Friendbook:a semantic-based friend recommendation system for social networks[J].IEEE Transactions on Mobile Computing,2015,14(3):538-551.
[7]BOUAKAZ A,TALPIN J P,VITEK J.Affine data-flow graphs for the synthesis of hard real-time applications[C]∥2012 12th International Conference on Application of Concurrency to System Design (ACSD).IEEE,2012:183-192.
[8]AKIDAU T,BRADSHAW R,CHAMBERS C,et al.The dataflow model:a practical approach to balancing correctness,latency,and cost in massive-scale,unbounded,out-of-order data processing[J].Proceedings of the VLDB Endowment,2015,8(12):1792-1803.
[9]MENG X,BRADLEY J,YAVUZ B,et al.Mllib:Machine lear-
ning in apache spark[J].The Journal of Machine Learning Research,2016,17(1):1235-1241.
[10]LU J,WU D,MAO M,et al.Recommender system application developments:A survey[J].Decision Support Systems,2015,74(C):12-32.
[11]ALEXANDER M,NARAYANAMURTHY S.An architecture
for parallel topic models[J].Proceedings of the VLDB Endowment,2010,3(1):703-710.
[12]LI M,ZHOU L,YANG Z,et al.Parameter server for distributed machine learning[C]∥Big Learning NIPS Workshop.2013.
[13]LI M.Scaling Distributed Machine Learning with the Parameter Server[C]∥International Conference on Big Data Science and Computing.ACM,2014.
[14]LI M,ANDERSEN D G,SMOLA A J,et al.Communication efficient distributed machine learning with the parameter server[C]∥Advances in Neural Information Processing Systems.2014:19-27.
[15]HO Q,CIPAR J,CUI H,et al.More effective distributed ml via a stale synchronous parallel parameter server[C]∥Advances in neural information processing systems.2013:1223-1231.
[16]AHMED A,SHERVASHIDZE N,NARAYANAMURTHY S,et al.Distributed large-scale natural graph factorization[C]∥Proceedings of the 22nd International Conference on World Wide Web.ACM,2013:37-48.
[17]DEAN J,CORRADO G,MONGA R,et al.Large scale distributed deep networks[C]∥Advances in neural information proces-sing systems.2012:1223-1231.
[18]XING E P,HO Q,DAI W,et al.Petuum:A new platform for dis-
tributed machine learning on big data[J].IEEE Transactions on Big Data,2015,1(2):49-67.
[19]DROR G,KOENIGSTEIN N,KOREN Y,et al.The yahoo! music dataset and kdd-cup’11[C]∥Proceedings of KDD Cup 2011.2012:3-18.
[20]HE K,ZHANG X,REN S,et al.Deep residual learning for ima-
ge recognition∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[21]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet
classification with deep convolutional neural networks[C]∥Advances in neural information processing systems.2012:1097-1105.
[22]DAI W,KUMAR A,WEI J,et al.High-Performance Distributed ML at Scale through Parameter Server Consistency Models[C]∥29th AAAI Conference on Artificial Intelligence(AAA-15).2015:79-87.
[23]LIAW A,WIENER M.Classification and regression by random
Forest[J].R News,2002,2(3):18-22.
[24]HOSMER J D W,LEMESHOW S,STURDIVANT R X.Applied logistic regression[M].New York:John Wiley & Sons,2013.
[25]ABADI M,BARHAM P,CHEN J,et al.TensorFlow:A System for Large-Scale Machine Learning[J].arXiv:1605.08695,2016.
[26]ARVIND,CULLER D E.Dataflow Architectures.Annual
Review of Computer Science,2010,1(1):225-253.
[27]SAK H,SENIOR A,BEAUFAYS F.Long short-term memory recurrent neural network architectures for large scale acoustic modeling[C]∥Fifteenth Annual Conference of the International Speech Communication Association.2014.
[28]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence
learning with neural networks[C]∥Advances in neural information processing systems.2014:3104-3112.
[29]VISHNU A,SIEGEL C,DAILY J.Distributed tensorflow with MPI[J].arXiv:1603.02339,2016.
[30]JIA Y,SHELHAMER E,DONAHUE J,et al.Caffe:Convolutional architecture for fast feature embedding[C]∥Proceedings of the 22nd ACM International Conference on Multimedia.ACM,2014:675-678.
[31]GOODFELLOW I,BENGIO Y,COURVILLE A.Deep learning[M].Massachusetts:MIT press,2016.
[32]KANG L Y,WANG J F,LIU J,et al.Survey on parallel and dis-
tributed optimization algorithms for scalable machine learning[J].Journal of Software,2018,29(1):109-130.(in Chinese)
亢良伊,王建飞,刘杰,等.可扩展机器学习的并行与分布式优化算法综述[J].软件学报,2018,29(1):109-130.
[33]LIU T Y,CHEN W,WANG T.Distributed machine learning:Foundations,trends,and practices[C]∥Proceedings of the 26th International Conference on World Wide Web Companion.International World Wide Web Conferences Steering Committee,2017:913-915.
[34]ZHOU J,DING Y,et al.KunPeng:Parameter Server based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2017:1693-1702.
[35]SUNG N,KIM M,JO H,et al.NSML:A Machine Learning
Platform That Enables You to Focus on Your Models[J].ar-Xiv:1712.05902.
[36]SABOUR S,FROSST N,HINTON G E.Dynamic routing between capsules[C]∥Advances in Neural Information Processing Systems.2017:3859-3869.
[37]GAO Y,PHILLIPS J M,ZHENG Y,et al.Fully convolutional
structured LSTM networks for joint 4D medical image segmentation[C]∥2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).IEEE,2018:1104-1108.
[38]NAZARI M,OROOJLOOY A,SNYDER L V,et al.Deep Reinforcement Learning for Solving the Vehicle Routing Problem[J].arXiv:1802.04240.
[39]LEE K,LAM M,PEDARSANI R,et al.Speeding up distributed
machine learning using codes[J].IEEE Transactions on Information Theory,2017,PP(99):1.
[1] LENG Dian-dian, DU Peng, CHEN Jian-ting, XIANG Yang. Automated Container Terminal Oriented Travel Time Estimation of AGV [J]. Computer Science, 2022, 49(9): 208-214.
[2] NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[3] HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[4] LI Yao, LI Tao, LI Qi-fan, LIANG Jia-rui, Ibegbu Nnamdi JULIAN, CHEN Jun-jie, GUO Hao. Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network [J]. Computer Science, 2022, 49(8): 257-266.
[5] ZHANG Guang-hua, GAO Tian-jiao, CHEN Zhen-guo, YU Nai-wen. Study on Malware Classification Based on N-Gram Static Analysis Technology [J]. Computer Science, 2022, 49(8): 336-343.
[6] CHEN Jing, WU Ling-ling. Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment [J]. Computer Science, 2022, 49(8): 108-112.
[7] CHEN Ming-xin, ZHANG Jun-bo, LI Tian-rui. Survey on Attacks and Defenses in Federated Learning [J]. Computer Science, 2022, 49(7): 310-323.
[8] XIAO Zhi-hong, HAN Ye-tong, ZOU Yong-pan. Study on Activity Recognition Based on Multi-source Data and Logical Reasoning [J]. Computer Science, 2022, 49(6A): 397-406.
[9] YAO Ye, ZHU Yi-an, QIAN Liang, JIA Yao, ZHANG Li-xiang, LIU Rui-liang. Android Malware Detection Method Based on Heterogeneous Model Fusion [J]. Computer Science, 2022, 49(6A): 508-515.
[10] LI Ya-ru, ZHANG Yu-lai, WANG Jia-chen. Survey on Bayesian Optimization Methods for Hyper-parameter Tuning [J]. Computer Science, 2022, 49(6A): 86-92.
[11] ZHAO Lu, YUAN Li-ming, HAO Kun. Review of Multi-instance Learning Algorithms [J]. Computer Science, 2022, 49(6A): 93-99.
[12] WANG Fei, HUANG Tao, YANG Ye. Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion [J]. Computer Science, 2022, 49(6A): 784-789.
[13] XU Jie, ZHU Yu-kun, XING Chun-xiao. Application of Machine Learning in Financial Asset Pricing:A Review [J]. Computer Science, 2022, 49(6): 276-286.
[14] CHEN Xin, LI Fang, DING Hai-xin, SUN Wei-ze, LIU Xin, CHEN De-xun, YE Yue-jin, HE Xiang. Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture [J]. Computer Science, 2022, 49(6): 99-107.
[15] SUN Xuan, WANG Huan-xiao. Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives [J]. Computer Science, 2022, 49(4): 67-73.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!