计算机科学 ›› 2025, Vol. 52 ›› Issue (2): 299-309.doi: 10.11896/jsjkx.240900101

• 计算机网络 • 上一篇    下一篇

面向智能物联网异构嵌入式芯片的自适应算子并行分割方法

林政1, 刘思聪1, 郭斌1, 丁亚三1, 於志文1,2   

  1. 1 西北工业大学计算机学院 西安 710072
    2 哈尔滨工程大学 哈尔滨 150001
  • 收稿日期:2024-09-16 修回日期:2024-11-02 出版日期:2025-02-15 发布日期:2025-02-17
  • 通讯作者: 郭斌(guob@nwpu.edu.cn)
  • 作者简介:(zhenglin@mail.nwpu.edu.cn)
  • 基金资助:
    国家杰出青年科学基金(62025205);国家自然科学基金(62032020,62302017)

Adaptive Operator Parallel Partitioning Method for Heterogeneous Embedded Chips in AIoT

LIN Zheng1, LIU Sicong1, GUO Bin1, DING Yasan1, YU Zhiwen1,2   

  1. 1 College of Computer Science,Northwestern Polytechnical University,Xi'an 710072,China
    2 Harbin Engineering University,Harbin 150001,China
  • Received:2024-09-16 Revised:2024-11-02 Online:2025-02-15 Published:2025-02-17
  • About author:LIN Zheng,born in 2002,postgraduate.His main research interests include ubiquitous computing,and mobile crowd sensing.
    GUO Bin,born in 1980,Ph.D,Ph.D supervisor,is a member of CCF(No.E200019107S).His main research interests include ubiquitous computing,and mobile crowd sensing.
  • Supported by:
    National Science Fund for Distinguished Young Scholars of China(62025205) and National Natural Science Foundation of China(62032020, 62302017).

摘要: 随着人民生活质量的持续提升与科技发展的日新月异,智能手机等移动设备在全球范围内得到了广泛普及。在这一背景下,深度神经网络在移动端的部署与应用成为了研究的热点。深度神经网络不仅推动了移动应用领域的显著进步,同时也对使用电池供电的移动设备的能效管理提出了更高要求。当今移动设备中异构处理器的兴起给优化能效带来了新的挑战,在不同处理器间分配计算任务以实现深度神经网络并行处理和加速,并不一定能够优化能耗,甚至可能会增加能耗。针对这一问题,提出了一种能效优化的深度神经网络自适应并行计算调度系统。该系统包括一个运行时能耗分析器与在线算子划分执行器,能够根据动态设备条件动态调整算子分配,在保持高响应性的同时,优化了移动设备异构处理器上的计算能效。实验结果证明,相比基准方法,能效优化的深度神经网络自适应并行计算调度系统在移动设备深度神经网络上的平均能耗和平均时延减少了5.19%和9.0%,最大能耗和最大时延减少了18.35%和21.6%。

关键词: 深度神经网络, 移动设备, 能效优化, 异构处理器, 能耗预测

Abstract: With the continuous improvement of people's quality of life and the rapid development of technology,mobile devices such as smartphones have achieved widespread popularity globally.Against this backdrop,the deployment and application of deep neural networks on mobile devices have become a research hotspot.Deep neural networks not only drive significant progress in the field of mobile applications,but also pose higher requirements for energy efficiency management of battery-powered mobile devices.Meanwhile,the rise of heterogeneous processors in today's mobile devices brings new challenges to energy efficiency optimization.The allocation of computing tasks among different processors to achieve parallel processing and acceleration of deep neural networks does not necessarily optimize energy consumption and may even increase it.To address this issue,this paper proposes an energy-efficient adaptive parallel computing scheduling system for deep neural networks.This system comprises a runtime energy consumption analyzer and an online operator partitioning executor,which can dynamically adjust operator allocation based on dynamic device conditions,ensuring optimized energy efficiency for computing on heterogeneous processors of mobile devices while maintaining high responsiveness.Experimental results demonstrate that compared to baseline methods,the system designed in this paper reduces average energy consumption and latency by 5.19% and 9.0% respectively,and the maximum energy consumption and latency are reduced by 18.35% and 21.6% on mobile device deep neural networks.

Key words: Deep neural networks, Mobile device, Energy efficiency optimization, Heterogeneous processors, Energy consumption prediction

中图分类号: 

  • TP391
[1]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[2]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[3]HAN S,LIU X,DALLY W J.Efficient Methods and Hardware for Deep Learning[J].Proceedings of the IEEE,2017,105(12):2295-2329.
[4]MITTAL S.A Survey of Techniques for Architecting Processor Components Using Domain-Specific Hardware Accelerators[J].Sustainable Computing:Informatics and Systems,2016,9:17-26.
[5]HOROWITZ M.Computing's energy problem(and what we can do about it)[C]//2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers(ISSCC).2014:10-14.
[6]CHEN Y,LI T,CHEN Y,et al.Understanding the energy efficiency of deep learning models:An experimental study[C]//Proceedings of the 2018 IEEE International Symposium on Workload Characterization.2018:11-21.
[7]YANG T J,CHEN Y H,EMER J,et al.A method to estimate the energy consumption of deep neural networks[C]//Procee-dings of the 51st Asilomar Conference on Signals,Systems,and Computers.Pacific Grove,USA:2017:1916-1920.
[8]RODRIGUES C F,RILEY G,LUJÁN M.SyNERGY:An energy measurement and prediction framework for Convolutional Neural Networks on Jetson TX1[C]//Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications(PDPTA).The Steering Committee of The World Congress in Computer Science,Computer Engineering and Applied Computing(WorldComp),2018:375-382.
[9]RODRIGUES C F,RILEY G,LUJÁN M.Energy predictivemodels for convolutional neural networks on mobile platforms[J].arXiv:2004.05137,2020.
[10]DAI X,ZHANG P,WU B,et al.Chamnet:Towards efficient network design through platform-aware model adaptation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:11398-11407.
[11]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015.
[12]ROMERO A,BALLAS N,KAHOU S E,et al.FitNets:Hints for thin deep nets[J].arXiv:1412.6550,2015.
[13]HAN S,POOL J,TRAN J,et al.Learning both weights andconnections for efficient neural networks[C]//Advances in Neural Information Processing Systems.2015:1135-1143.
[14]MOLCHANOV P,TYREE S,KARRAS T,et al.Pruning convolutional neural networks for resource efficient inference[J].arXiv:1611.06440,2017.
[15]COURBARIAUX M,BENGIO Y,DAVID J.Binary Connect:Training deep neural networks with binary weights during propagations[C]//Advances in Neural Information Processing Systems.2015:3123-3131.
[16]SAINATH T N,KINGSBURY B,SINDHWANI V,et al.Low-rank matrix factorization for deep neural network training with high-dimensional output targets[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing.2013:6655-6659.
[17]DENTON E L,ZAREMBA W,BRUNA J,et al.Exploiting linear structure within convolutional networks for efficient evaluation[C]//Advances in Neural Information Processing Systems.2014:1269-1277.
[18]ZHANG X,LI H,HE Y,et al.Co-optimizing performance and energy efficiency of DAG scheduling on heterogeneous systems[C]//Proceedings of the 2018 IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing.2018:381-390.
[19]LI X,MA H,CHEN Y.Optimizing operator parallelism forgraph neural networks on GPUs[C]//Proceedings of the 2019 IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques.2019:303-315.
[20]WEI J,CAO T,CAO S,et al.NN-Stretch:Automatic NeuralNetwork Branching for Parallel Inference on Heterogeneous Multi-Processors[C]//Proceedings of the 21st Annual International Conference on Mobile Systems,Applications and Ser-vices.2023:70-83.
[21]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[22]CHENG Y,WANG D,ZHOU P,et al.Model compression and acceleration for deep neural networks:The principles,progress,and challenges[J].IEEE Signal Processing Magazine,2018,35(1):126-136.
[23]JIA F,ZHANG D,CAO T,et al.CoDL:efficient CPU-GPU co-execution for deep learning inference on mobile devices[C]//MobiSys.2022:209-221.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!