计算机科学 ›› 2023, Vol. 50 ›› Issue (2): 3-12.doi: 10.11896/jsjkx.20221100135

• 边缘智能协同技术及前沿应用 • 上一篇    下一篇

Optimization and Deployment of Memory-Intensive Operations in Deep Learning Model on Edge

Peng XU, Jianxin ZHAO, Chi Harold LIU   

  1. Department of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China
  • 收稿日期:2022-11-15 修回日期:2023-01-19 出版日期:2023-02-15 发布日期:2023-02-22

Optimization and Deployment of Memory-Intensive Operations in Deep Learning Model on Edge

Peng XU, Jianxin ZHAO, Chi Harold LIU   

  1. Department of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China
  • Received:2022-11-15 Revised:2023-01-19 Online:2023-02-15 Published:2023-02-22
  • Contact: Jianxin ZHAO(jianxin.zhao@bit.edu.cn)
  • About author:(xupeng_mii@163.com)
  • Supported by:
    National Natural Science Foundation of China(U21A20519)

Abstract: As a large amount of data is increasingly generated from edge devices,such as smart homes,mobile phones,and wearable devices,it becomes crucial for many applications to deploy machine learning modes across edge devices.The execution speed of the deployed model is a key element to ensure service quality.Considering a highly heterogeneous edge deployment scenario,deep learning compiling is a novel approach that aims to solve this problem.It defines models using certain DSLs and generates efficient code implementations on different hardware devices.However,there are still two aspects that are not yet thoroughly investigated yet.The first is the optimization of memory-intensive operations,and the second problem is the heterogeneity of the deployment target.To that end,in this work,we propose a system solution that optimizes memory-intensive operation,optimizes the subgraph distribution,and enables the compiling and deployment of DNN models on multiple targets.The evaluation results show the performance of our proposed system.

Key words: Memory optimization, Deep compiler, Computation optimization, Model deployment, Edge computing

中图分类号: 

  • TP311.5
[1]ZHAO J,CHANG X,FENG Y,et al,Participant Selection for Federated Learning with Heterogeneous Data in Intelligent Transport System[J].IEEE Transactions on Intelligent Transportation Systems,2023,24(1):1106-1115.
[2]CASTIGLIONI I,RUNDO L,CODARI M,et al.AI applications to medical images:From machine learning to deep learning[J].Physica Medica,2021,83:9-24.
[3]MAHMUD M,KAISER M S,MCGINNITY T M,et al.Deep learning in mining biological data[J].Cognitive Computation,2021,13(1):1-33.
[4]ZHAO J,HAN R,YANG Y,et al.Federated Learning with Heterogeneity-Aware Probabilistic Synchronous Parallel on Edge[J].IEEE Transactions on Services Computing,2021,15(2):614-626.
[5]BARRACHINA S,CASTILLO M,IGUAL F D,et al.Evaluation and tuning of the level 3 CUBLAS for graphics processors[C]//2008 IEEE International Symposium on Parallel and Distributed Processing.IEEE,2008:1-8.
[6]LI M,LIU Y,LIU X,et al.The deep learning compiler:A comprehensive survey[J].IEEE Transactions on Parallel and Distributed Systems,2020,32(3):708-727.
[7]ZHENG Z,YANG X,ZHAO P,et al.A Stitch:enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures[C]//Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems.2022:359-373.
[8]KLJUCARIC L,JOHNSON A,GEORGEA D.Architecturalanalysis of deep learning on edge accelerators[C]//2020 IEEE High Performance Extreme Computing Conference(HPEC).IEEE,2020:1-7.
[9]BARHAM P,DRAGOVIC B,FRASER K,et al.Xen and the art of virtualization[J].ACM SIGOPS Operating Systems Review,2003,37(5):164-177.
[10]MAO Y,YOU C,ZHANG J,et al.A survey on mobile edge computing:The communication perspective[J].IEEE Communications Surveys & Tutorials,2017,19(4):2322-2358.
[11]SHI W,CAO J,ZHANG Q,et al.Edge computing:Vision and challenges[J].IEEE Internet of Things Journal,2016,3(5):637-646.
[12]CHEN M X,ZHANG J B,LI T R.Survey on Attacks and Defenses in Federated Learning[J].Computer Science,2022,49(7):310-323.
[13]LI Q,WEN Z,WU Z,et al.A survey on federated learning systems:vision,hype and reality for data privacy and protection[J/OL].IEEE Transactions on Knowledge and Data Engineering,2021.https://ieeexplore.ieee.org/document/9599369/.
[14]LI T,SAHU A K,TALWALKAR A,et al.Federated learning:Challenges,methods,and future directions[J].IEEE Signal Processing Magazine,2020,37(3):50-60.
[15]MCMAHAN B,MOORE E,RAMAGE D,et al.Communication-efficient learning of deep networks from decentralized data[C]//Artificial Intelligence and Statistics.PMLR,2017:1273-1282.
[16]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[17]CYPHERS S,BANSAL A K,BHIWANDIWALLA A,et al.Intel ngraph:An intermediate representation,compiler,and executor for deep learning[J].arXiv:1801.08058,2018.
[18]VASILACHE N,ZINENKO O,THEODORIDIST,et al.Tensor comprehensions:Framework-agnostic high-performance machine learning abstractions[J].arXiv:1802.04730,2018.
[19]LEARY C,WANG T.XLA - TensorFlow,compiled [EB/OL].TensorFlow Dev Summit.https://www.tensorflow.org/xia.
[20]ROTEM N,FIX J,ABDULRASOOL S,et al.Glow:Graph lo-wering compiler techniques for neural networks[J].arXiv:1805.00907,2018.
[21]LATTNER C,AMINI M,BONDHUGULA U,et al.MLIR:A compiler infrastructure for the end of Moore's law[J].arXiv:2002.11054,2020.
[22]ROESCH J,LYUBOMIRSKY S,KIRISAME M,et al.Relay:A high-level compiler for deep learning[J].arXiv:1904.08368,2019.
[23]CHEN T,ZHENG L,YAN E,et al.Learning to optimize tensor programs[C]//Proceedings of the 32nd International Confe-rence on Neural Information Processing Systems(NIPS'18). 2018:3393-3404.
[24]ZHENG S,LIANG Y,WANG S,et al.Flextensor:An automatic schedule exploration and optimization framework for tensor computation on heterogeneous system[C]//Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems.2020:859-873.
[25]ZHENG L,JIA C,SUN M,et al.Ansor:Generating High-Performance Tensor Programs for Deep Learning[C]//14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20).2020:863-879.
[26]JIA Z,PADON O,THOMAS J,et al.TASO:optimizing deep learning computation with automatic generation of graph substitutions[C]//Proceedings of the 27th ACM Symposium on Ope-rating Systems Principles.2019:47-62.
[27]WANG B Y,PANG J M,XU J L,et al.Matrix Multiplication Vector Code Generation Based on Polyhedron Model[J].Computer Science,2022,49(10):44-51.
[28]WANG H,ZHAI J,GAOM,et al.{PET}:Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections[C]//15th USENIX Symposium on Operating Systems Design and Implementation(OSDI 21).2021:37-54.
[29]CRANKSHAW D,WANG X,ZHOUG,et al.Clipper:A Low-Latency Online Prediction Serving System[C]//14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17).2017:613-627.
[30]Olston 2017:TensorFlow Serving[EB/OL].TensorFlow Serving.https://www.tensorflow.org/tfx/guide/serving.
[31]KLAISE J,VAN LOOVEREN A,COX C,et al.Monitoring and explainability of models in production[J]. arXiv:2007.06299,2020.
[32]YANG Y,ZHAO L,LI Y,et al.INFless:a native serverless system for low-latency,high-throughput inference[C]//Procee-dings of the 27th ACM International Conference on ArchitecturalSupport for Programming Languages and Operating Systems.2022:768-781.
[33]IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning.PMLR,2015:448-456.
[34]ZHAO J,FENG Y,CHANG X,et al.Energy-Efficient and Fair IoT Data Distribution in Decentralised Federated Learning [J/OL].IEEE Transactions on Network Science and Engineering,2022,23.https://ieeexplore.ieee.org/document/9804872.
[35]LIU Z H,ZHENG H Q,ZHANG J S,et al.Computation Offloading and Deployment Optimization in Multi-UAV-Enabled Mobile Edge Computing Systems[J].Computer Science,2022,49(6A):619-627.
[36]ZHAO J,TIPLEA T,MORTIER R,et al.Data analytics service composition and deployment on edge devices[C]//Proceedings of the 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks.2018:27-32
[1] 孟怡悦, 彭蓉, 吕其标.
一种结合标签分类和语义查询扩展的文本素材推荐方法
Text Material Recommendation Method Combining Label Classification and Semantic QueryExpansion
计算机科学, 2023, 50(1): 76-86. https://doi.org/10.11896/jsjkx.220100078
[2] 张冰清, 费琪, 王轶辰, 杨召.
面向SOA的集成测试序列生成算法研究
Study on Integration Test Order Generation Algorithm for SOA
计算机科学, 2022, 49(11): 24-29. https://doi.org/10.11896/jsjkx.210400210
[3] 王浩宇.
软件需求工程技术综述
Review on Technologies of Requirement Engineering of Software
计算机科学, 2022, 49(11A): 210900132-14. https://doi.org/10.11896/jsjkx.210900132
[4] 阳真, 黄松, 郑长友.
基于区块链与改进CP-ABE的众测知识产权保护技术研究
Study on Crowdsourced Testing Intellectual Property Protection Technology Based on Blockchain and Improved CP-ABE
计算机科学, 2022, 49(5): 325-332. https://doi.org/10.11896/jsjkx.210900075
[5] 江昊琛, 魏子麒, 刘璘, 陈俊.
非均衡数据分类经典方法综述与面向医疗领域的实验分析
Imbalanced Data Classification:A Survey and Experiments in Medical Domain
计算机科学, 2022, 49(1): 80-88. https://doi.org/10.11896/jsjkx.210200124
[6] 王继文, 吴毅坚, 彭鑫.
基于演化和语义特征的上帝类检测方法
Approach of God Class Detection Based on Evolutionary and Semantic Features
计算机科学, 2021, 48(12): 59-66. https://doi.org/10.11896/jsjkx.210100077
[7] 周晟伊, 曾红卫.
进化算法与符号执行结合的程序复杂度分析方法
Program Complexity Analysis Method Combining Evolutionary Algorithm with Symbolic Execution
计算机科学, 2021, 48(12): 107-116. https://doi.org/10.11896/jsjkx.210200052
[8] 张健雄, 宋坤, 何鹏, 李兵.
基于图神经网络的软件系统中关键类的识别
Identification of Key Classes in Software Systems Based on Graph Neural Networks
计算机科学, 2021, 48(12): 149-158. https://doi.org/10.11896/jsjkx.210100200
[9] 肖锋, 张鹏程, 罗夏朴.
基于正则表达式、程序插桩和代码替换的以太坊智能合约bug检测和修复方法
Ethereum Smart Contract Bug Detection and Repair Approach Based on Regular Expressions, Program Instrumentation and Code Replacement
计算机科学, 2021, 48(11): 89-101. https://doi.org/10.11896/jsjkx.210600064
[10] 张慧.
基于深度卷积网络的多错误定位方法
Multiple Fault Localization Method Based on Deep Convolutional Network
计算机科学, 2021, 48(11A): 88-92. https://doi.org/10.11896/jsjkx.210200096
[11] 姚楠, 张征.
基于三维图像的疤痕面积计算
Scar Area Calculation Based on 3D Image
计算机科学, 2021, 48(11A): 308-313. https://doi.org/10.11896/jsjkx.201100044
[12] 曹林, 于威威.
基于图像分割的自适应窗口双目立体匹配算法研究
Adaptive Window Binocular Stereo Matching Algorithm Based on Image Segmentation
计算机科学, 2021, 48(11A): 314-318. https://doi.org/10.11896/jsjkx.201200264
[13] 朱平.
基于思维图的复杂算法设计和维护方法
Complex Algorithm Design and Maintenance Based on Thinking Map
计算机科学, 2021, 48(11A): 682-687. https://doi.org/10.11896/jsjkx.210100065
[14] 余乐章, 夏天宇, 荆一楠, 何震瀛, 王晓阳.
面向大数据分析的智能交互向导系统
Smart Interactive Guide System for Big Data Analytics
计算机科学, 2021, 48(9): 110-117. https://doi.org/10.11896/jsjkx.200900083
[15] 黄双芹, 刘英博, 黄向生.
模型驱动开发工具的自动化测试技术研究
Research on Automatic Testing Technology of Model Driven Development Tools
计算机科学, 2021, 48(6A): 568-571. https://doi.org/10.11896/jsjkx.201000139
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!