计算机科学 ›› 2025, Vol. 52 ›› Issue (2): 299-309.doi: 10.11896/jsjkx.240900101
林政1, 刘思聪1, 郭斌1, 丁亚三1, 於志文1,2
LIN Zheng1, LIU Sicong1, GUO Bin1, DING Yasan1, YU Zhiwen1,2
摘要: 随着人民生活质量的持续提升与科技发展的日新月异,智能手机等移动设备在全球范围内得到了广泛普及。在这一背景下,深度神经网络在移动端的部署与应用成为了研究的热点。深度神经网络不仅推动了移动应用领域的显著进步,同时也对使用电池供电的移动设备的能效管理提出了更高要求。当今移动设备中异构处理器的兴起给优化能效带来了新的挑战,在不同处理器间分配计算任务以实现深度神经网络并行处理和加速,并不一定能够优化能耗,甚至可能会增加能耗。针对这一问题,提出了一种能效优化的深度神经网络自适应并行计算调度系统。该系统包括一个运行时能耗分析器与在线算子划分执行器,能够根据动态设备条件动态调整算子分配,在保持高响应性的同时,优化了移动设备异构处理器上的计算能效。实验结果证明,相比基准方法,能效优化的深度神经网络自适应并行计算调度系统在移动设备深度神经网络上的平均能耗和平均时延减少了5.19%和9.0%,最大能耗和最大时延减少了18.35%和21.6%。
中图分类号:
[1]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014. [2]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [3]HAN S,LIU X,DALLY W J.Efficient Methods and Hardware for Deep Learning[J].Proceedings of the IEEE,2017,105(12):2295-2329. [4]MITTAL S.A Survey of Techniques for Architecting Processor Components Using Domain-Specific Hardware Accelerators[J].Sustainable Computing:Informatics and Systems,2016,9:17-26. [5]HOROWITZ M.Computing's energy problem(and what we can do about it)[C]//2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers(ISSCC).2014:10-14. [6]CHEN Y,LI T,CHEN Y,et al.Understanding the energy efficiency of deep learning models:An experimental study[C]//Proceedings of the 2018 IEEE International Symposium on Workload Characterization.2018:11-21. [7]YANG T J,CHEN Y H,EMER J,et al.A method to estimate the energy consumption of deep neural networks[C]//Procee-dings of the 51st Asilomar Conference on Signals,Systems,and Computers.Pacific Grove,USA:2017:1916-1920. [8]RODRIGUES C F,RILEY G,LUJÁN M.SyNERGY:An energy measurement and prediction framework for Convolutional Neural Networks on Jetson TX1[C]//Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications(PDPTA).The Steering Committee of The World Congress in Computer Science,Computer Engineering and Applied Computing(WorldComp),2018:375-382. [9]RODRIGUES C F,RILEY G,LUJÁN M.Energy predictivemodels for convolutional neural networks on mobile platforms[J].arXiv:2004.05137,2020. [10]DAI X,ZHANG P,WU B,et al.Chamnet:Towards efficient network design through platform-aware model adaptation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:11398-11407. [11]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015. [12]ROMERO A,BALLAS N,KAHOU S E,et al.FitNets:Hints for thin deep nets[J].arXiv:1412.6550,2015. [13]HAN S,POOL J,TRAN J,et al.Learning both weights andconnections for efficient neural networks[C]//Advances in Neural Information Processing Systems.2015:1135-1143. [14]MOLCHANOV P,TYREE S,KARRAS T,et al.Pruning convolutional neural networks for resource efficient inference[J].arXiv:1611.06440,2017. [15]COURBARIAUX M,BENGIO Y,DAVID J.Binary Connect:Training deep neural networks with binary weights during propagations[C]//Advances in Neural Information Processing Systems.2015:3123-3131. [16]SAINATH T N,KINGSBURY B,SINDHWANI V,et al.Low-rank matrix factorization for deep neural network training with high-dimensional output targets[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing.2013:6655-6659. [17]DENTON E L,ZAREMBA W,BRUNA J,et al.Exploiting linear structure within convolutional networks for efficient evaluation[C]//Advances in Neural Information Processing Systems.2014:1269-1277. [18]ZHANG X,LI H,HE Y,et al.Co-optimizing performance and energy efficiency of DAG scheduling on heterogeneous systems[C]//Proceedings of the 2018 IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing.2018:381-390. [19]LI X,MA H,CHEN Y.Optimizing operator parallelism forgraph neural networks on GPUs[C]//Proceedings of the 2019 IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques.2019:303-315. [20]WEI J,CAO T,CAO S,et al.NN-Stretch:Automatic NeuralNetwork Branching for Parallel Inference on Heterogeneous Multi-Processors[C]//Proceedings of the 21st Annual International Conference on Mobile Systems,Applications and Ser-vices.2023:70-83. [21]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9. [22]CHENG Y,WANG D,ZHOU P,et al.Model compression and acceleration for deep neural networks:The principles,progress,and challenges[J].IEEE Signal Processing Magazine,2018,35(1):126-136. [23]JIA F,ZHANG D,CAO T,et al.CoDL:efficient CPU-GPU co-execution for deep learning inference on mobile devices[C]//MobiSys.2022:209-221. |
|