计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 527-531.
余昌发, 程学林, 杨小虎
YU Chang-fa, CHEN Xue-lin, YANG Xiao-hu
摘要: 文中介绍了基于Kubernetes的分布式TensorFlow平台的设计与实现,针对分布式TensorFlow存在的环境配置复杂、底层物理资源分布不均、训练效率过低、模型研发周期长等问题,提出了一种容器化TensorFlow的方法,并基于Kubernetes容器PaaS平台来统一调度管理TensorFlow容器。文中将Kubernetes和TensorFlow的优点相结合,由Kubernetes提供可靠、稳定的计算环境,以充分发挥TensorFlow异构的优势,极大地降低了大规模使用的难度,同时建立了一个敏捷的管理平台,实现了分布式TensorFlow资源的快速分配、一键部署、秒级启动、动态伸缩、高效训练等。
中图分类号:
[1]ABADI M,AGARWAL A,BARHAM P,et al.TensorFlow:Large-Scale Machine Learning on Heterogeneous Distributed System[J].arXiv:1603.04467v2,2016. [2]龚正,吴治辉,王伟,等.Kubernetes权威指南:从Docker到Kubernetes实践全接触(纪念版)[M].北京:电子工业版社,2017:1-42. [3]浙江大学SEL实验室.Docker容器与容器云[M].北京:人民邮电出版社,2016:1-27. [4]李航.统计学习方法 [M].北京:清华大学出版社,2012:1-24. [5]李嘉璇.TensorFlow技术解析与实战[M].北京:人民邮电出版社,2017:218-224. [6]PEINL R,HOLZSCHUHER A F,PFITZER F.Docker Cluster Management for the Cloud-Survey Results and Own Solution[J].Grid Computing,2016,14:265-282. [7]Serving a TensorFlow Model[EB/OL].https://www.tensorflow.org/serving/serving_basic. [8]go-restful[EB/OL].https://github.com/emicklei/go-restful. [9]CHANG F,DEAN J,GHEMAWAT S,et al.Gruber.Bigtable:A Distributed Storage System for Structured Data[J].ACM Transactions on Computer Systems (TOCS),2008,26(2):1-26. [10]朱林.Elasticsearch技术解析与实战[M].北京:机械工业出版社,2017:6-10. [11]https://github.com/kubernetes/examples/blob/master/staging/volumes/glusterfs/README.md. [12]https://github.com/heketi/heketi. [13]https://en.wikipedia.org/wiki/Network_File_System. [14]SEYMOUR K,NAKADA H,MATSUOKA S,et al.Overview of GridRPC:A Remote Procedure Call API for Grid Computing[J].Grid Computing,2002,2536:274-278. [15]http://yann.lecun.com/exdb/mnist. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[3] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[4] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[5] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[6] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[7] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[8] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[9] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[10] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[11] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[12] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[13] | 王君锋, 刘凡, 杨赛, 吕坦悦, 陈峙宇, 许峰. 基于多源迁移学习的大坝裂缝检测 Dam Crack Detection Based on Multi-source Transfer Learning 计算机科学, 2022, 49(6A): 319-324. https://doi.org/10.11896/jsjkx.210500124 |
[14] | 楚玉春, 龚航, 王学芳, 刘培顺. 基于YOLOv4的目标检测知识蒸馏算法研究 Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 计算机科学, 2022, 49(6A): 337-344. https://doi.org/10.11896/jsjkx.210600204 |
[15] | 周志豪, 陈磊, 伍翔, 丘东亮, 梁广升, 曾凡巧. 基于SMOTE-SDSAE-SVM的车载CAN总线入侵检测算法 SMOTE-SDSAE-SVM Based Vehicle CAN Bus Intrusion Detection Algorithm 计算机科学, 2022, 49(6A): 562-570. https://doi.org/10.11896/jsjkx.210700106 |
|