Optimization and Deployment of Memory-Intensive Operations in Deep Learning Model on Edge

doi:10.11896/jsjkx.20221100135

Abstract

Abstract: As a large amount of data is increasingly generated from edge devices,such as smart homes,mobile phones,and wearable devices,it becomes crucial for many applications to deploy machine learning modes across edge devices.The execution speed of the deployed model is a key element to ensure service quality.Considering a highly heterogeneous edge deployment scenario,deep learning compiling is a novel approach that aims to solve this problem.It defines models using certain DSLs and generates efficient code implementations on different hardware devices.However,there are still two aspects that are not yet thoroughly investigated yet.The first is the optimization of memory-intensive operations,and the second problem is the heterogeneity of the deployment target.To that end,in this work,we propose a system solution that optimizes memory-intensive operation,optimizes the subgraph distribution,and enables the compiling and deployment of DNN models on multiple targets.The evaluation results show the performance of our proposed system.

Key words: Memory optimization, Deep compiler, Computation optimization, Model deployment, Edge computing

CLC Number:

TP311.5

Peng XU, Jianxin ZHAO, Chi Harold LIU. Optimization and Deployment of Memory-Intensive Operations in Deep Learning Model on Edge[J].Computer Science, 2023, 50(2): 3-12.

References

[1]ZHAO J,CHANG X,FENG Y,et al,Participant Selection for Federated Learning with Heterogeneous Data in Intelligent Transport System[J].IEEE Transactions on Intelligent Transportation Systems,2023,24(1):1106-1115.
[2]CASTIGLIONI I,RUNDO L,CODARI M,et al.AI applications to medical images:From machine learning to deep learning[J].Physica Medica,2021,83:9-24.
[3]MAHMUD M,KAISER M S,MCGINNITY T M,et al.Deep learning in mining biological data[J].Cognitive Computation,2021,13(1):1-33.
[4]ZHAO J,HAN R,YANG Y,et al.Federated Learning with Heterogeneity-Aware Probabilistic Synchronous Parallel on Edge[J].IEEE Transactions on Services Computing,2021,15(2):614-626.
[5]BARRACHINA S,CASTILLO M,IGUAL F D,et al.Evaluation and tuning of the level 3 CUBLAS for graphics processors[C]//2008 IEEE International Symposium on Parallel and Distributed Processing.IEEE,2008:1-8.
[6]LI M,LIU Y,LIU X,et al.The deep learning compiler:A comprehensive survey[J].IEEE Transactions on Parallel and Distributed Systems,2020,32(3):708-727.
[7]ZHENG Z,YANG X,ZHAO P,et al.A Stitch:enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures[C]//Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems.2022:359-373.
[8]KLJUCARIC L,JOHNSON A,GEORGEA D.Architecturalanalysis of deep learning on edge accelerators[C]//2020 IEEE High Performance Extreme Computing Conference(HPEC).IEEE,2020:1-7.
[9]BARHAM P,DRAGOVIC B,FRASER K,et al.Xen and the art of virtualization[J].ACM SIGOPS Operating Systems Review,2003,37(5):164-177.
[10]MAO Y,YOU C,ZHANG J,et al.A survey on mobile edge computing:The communication perspective[J].IEEE Communications Surveys & Tutorials,2017,19(4):2322-2358.
[11]SHI W,CAO J,ZHANG Q,et al.Edge computing:Vision and challenges[J].IEEE Internet of Things Journal,2016,3(5):637-646.
[12]CHEN M X,ZHANG J B,LI T R.Survey on Attacks and Defenses in Federated Learning[J].Computer Science,2022,49(7):310-323.
[13]LI Q,WEN Z,WU Z,et al.A survey on federated learning systems:vision,hype and reality for data privacy and protection[J/OL].IEEE Transactions on Knowledge and Data Engineering,2021.https://ieeexplore.ieee.org/document/9599369/.
[14]LI T,SAHU A K,TALWALKAR A,et al.Federated learning:Challenges,methods,and future directions[J].IEEE Signal Processing Magazine,2020,37(3):50-60.
[15]MCMAHAN B,MOORE E,RAMAGE D,et al.Communication-efficient learning of deep networks from decentralized data[C]//Artificial Intelligence and Statistics.PMLR,2017:1273-1282.
[16]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[17]CYPHERS S,BANSAL A K,BHIWANDIWALLA A,et al.Intel ngraph:An intermediate representation,compiler,and executor for deep learning[J].arXiv:1801.08058,2018.
[18]VASILACHE N,ZINENKO O,THEODORIDIST,et al.Tensor comprehensions:Framework-agnostic high-performance machine learning abstractions[J].arXiv:1802.04730,2018.
[19]LEARY C,WANG T.XLA - TensorFlow,compiled [EB/OL].TensorFlow Dev Summit.https://www.tensorflow.org/xia.
[20]ROTEM N,FIX J,ABDULRASOOL S,et al.Glow:Graph lo-wering compiler techniques for neural networks[J].arXiv:1805.00907,2018.
[21]LATTNER C,AMINI M,BONDHUGULA U,et al.MLIR:A compiler infrastructure for the end of Moore's law[J].arXiv:2002.11054,2020.
[22]ROESCH J,LYUBOMIRSKY S,KIRISAME M,et al.Relay:A high-level compiler for deep learning[J].arXiv:1904.08368,2019.
[23]CHEN T,ZHENG L,YAN E,et al.Learning to optimize tensor programs[C]//Proceedings of the 32nd International Confe-rence on Neural Information Processing Systems(NIPS'18). 2018:3393-3404.
[24]ZHENG S,LIANG Y,WANG S,et al.Flextensor:An automatic schedule exploration and optimization framework for tensor computation on heterogeneous system[C]//Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems.2020:859-873.
[25]ZHENG L,JIA C,SUN M,et al.Ansor:Generating High-Performance Tensor Programs for Deep Learning[C]//14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20).2020:863-879.
[26]JIA Z,PADON O,THOMAS J,et al.TASO:optimizing deep learning computation with automatic generation of graph substitutions[C]//Proceedings of the 27th ACM Symposium on Ope-rating Systems Principles.2019:47-62.
[27]WANG B Y,PANG J M,XU J L,et al.Matrix Multiplication Vector Code Generation Based on Polyhedron Model[J].Computer Science,2022,49(10):44-51.
[28]WANG H,ZHAI J,GAOM,et al.{PET}:Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections[C]//15th USENIX Symposium on Operating Systems Design and Implementation(OSDI 21).2021:37-54.
[29]CRANKSHAW D,WANG X,ZHOUG,et al.Clipper:A Low-Latency Online Prediction Serving System[C]//14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17).2017:613-627.
[30]Olston 2017:TensorFlow Serving[EB/OL].TensorFlow Serving.https://www.tensorflow.org/tfx/guide/serving.
[31]KLAISE J,VAN LOOVEREN A,COX C,et al.Monitoring and explainability of models in production[J]. arXiv:2007.06299,2020.
[32]YANG Y,ZHAO L,LI Y,et al.INFless:a native serverless system for low-latency,high-throughput inference[C]//Procee-dings of the 27th ACM International Conference on ArchitecturalSupport for Programming Languages and Operating Systems.2022:768-781.
[33]IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning.PMLR,2015:448-456.
[34]ZHAO J,FENG Y,CHANG X,et al.Energy-Efficient and Fair IoT Data Distribution in Decentralised Federated Learning [J/OL].IEEE Transactions on Network Science and Engineering,2022,23.https://ieeexplore.ieee.org/document/9804872.
[35]LIU Z H,ZHENG H Q,ZHANG J S,et al.Computation Offloading and Deployment Optimization in Multi-UAV-Enabled Mobile Edge Computing Systems[J].Computer Science,2022,49(6A):619-627.
[36]ZHAO J,TIPLEA T,MORTIER R,et al.Data analytics service composition and deployment on edge devices[C]//Proceedings of the 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks.2018:27-32

Related Articles 15

[1]	CHEN Yipeng, YANG Zhe, GU Fei, ZHAO Lei. Resource Allocation Strategy Based on Game Theory in Mobile Edge Computing [J]. Computer Science, 2023, 50(2): 32-41.
[2]	ZHENG Hongqiang, ZHANG Jianshan, CHEN Xing. Deployment Optimization and Computing Offloading of Space-Air-Ground Integrated Mobile Edge Computing System [J]. Computer Science, 2023, 50(2): 69-79.
[3]	SUN Hui-ting, FAN Yan-fang, MA Meng-xiao, CHEN Ruo-yu, CAI Ying. Dynamic Pricing-based Vehicle Collaborative Computation Offloading Scheme in VEC [J]. Computer Science, 2022, 49(9): 242-248.
[4]	YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253.
[5]	LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279.
[6]	FANG Tao, YANG Yang, CHEN Jia-xin. Optimization of Offloading Decisions in D2D-assisted MEC Networks [J]. Computer Science, 2022, 49(6A): 601-605.
[7]	LIU Zhang-hui, ZHENG Hong-qiang, ZHANG Jian-shan, CHEN Zhe-yi. Computation Offloading and Deployment Optimization in Multi-UAV-Enabled Mobile Edge Computing Systems [J]. Computer Science, 2022, 49(6A): 619-627.
[8]	YUAN Hao-nan, WANG Rui-jin, ZHENG Bo-wen, WU Bang-yan. Design and Implementation of Cross-chain Trusted EMR Sharing System Based on Fabric [J]. Computer Science, 2022, 49(6A): 490-495.
[9]	XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11.
[10]	ZHANG Hai-bo, ZHANG Yi-feng, LIU Kai-jian. Task Offloading,Migration and Caching Strategy in Internet of Vehicles Based on NOMA-MEC [J]. Computer Science, 2022, 49(2): 304-311.
[11]	LIN Chao-wei, LIN Bing, CHEN Xing. Study on Scientific Workflow Scheduling Based on Fuzzy Theory Under Edge Environment [J]. Computer Science, 2022, 49(2): 312-320.
[12]	CHENG Fan, WANG Rui-jin, ZHANG Feng-li. Federated Learning Optimization Method for Dynamic Weights in Edge Scenarios [J]. Computer Science, 2022, 49(12): 53-58.
[13]	GAO Yue-hong, CHEN Lu. Survey of Research on Task Offloading in Mobile Edge Computing [J]. Computer Science, 2022, 49(11A): 220400161-7.
[14]	ZHANG Xiao-mei, CAO Ying, LOU Ping, JIANG Xue-mei, YAN Jun-wei, LI Da. Lossless Data Compression Method Based on Edge Computing [J]. Computer Science, 2022, 49(11A): 210500195-6.
[15]	YUAN Xin-wang, XIE Zhi-dong, TAN Xin. Survey of Resource Management Optimization of UAV Edge Computing [J]. Computer Science, 2022, 49(11): 234-241.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!