计算机科学 ›› 2026, Vol. 53 ›› Issue (4): 269-276.doi: 10.11896/jsjkx.250900024
周豪捷1, 吴晓宁2, 高志强3, 韩锐1, 张青龙1, 刘驰1, 陈铮2, 赵玉2, 王硕2
ZHOU Haojie1, WU Xiaoning2, GAO Zhiqiang3, HAN Rui1, ZHANG Qinglong1, LIU Chi1, CHEN Zheng2, ZHAO Yu2, WANG Shuo2
摘要: 近年来,ViT模型凭借其强大的图像理解能力被广泛部署于边缘侧视觉应用。在资源受限边缘端推理中,ViT模型需依据可用资源对其进行有效缩放来获取最优的推理精度-延迟平衡。然而,现有推理模型缩放技术往往仅能在整个模型粒度进行缩放,导致关键信息丢失,需消耗更多计算资源/推理延迟来获取同样的精度。对此,提出LegoViT方法,旨在从ViT模型前馈网络中识别出可缩放模型块,以支持运行时块粒度模型缩放。对比模型粒度缩放方法的测试结果表明,LegoViT使ViT模型内存占用降低22.37%,计算量减少21.1%,推理延迟平均缩短61.05%。
中图分类号:
| [1]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16×16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020. [2]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022. [3]FANG B,ZENG X,ZHANG M.Nestdnn:Resource-awaremulti-tenant on-device deep learning for continuous mobile vision[C]//Proceedings of the 24th Annual International Confe-rence on Mobile Computing and Networking.2018:115-127. [4]HAN R,ZHANG Q,LIU C H,et al.Legodnn:block-grained scaling of deep neural networks for mobile vision[C]//Procee-dings of the 27th Annual International Conference on Mobile Computing and Networking.2021:406-419. [5]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[J].arXiv:1510.00149,2015. [6]LI H,HU C,JIANG J,et al.JALAD:Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution[C]//2018 IEEE 24th International Conference on Parallel and Distributed Systems(ICPADS).IEEE,2018:671-678. [7]KIM Y D,PARK E,YOO S,et al.Compression of deep convolutional neural networks for fast and low power mobile applications[J].arXiv:1511.06530,2015. [8]LI H,KADAV A,DURDANOVIC I,et al.Pruning filters for efficient convnets[J].arXiv:1608.08710,2016. [9]TANG Q,ZHANG B,LIU J,et al.Dynamic token pruning inplain vision transformers for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:777-786. [10]KONG Z,DONG P,MA X,et al.Spvit:Enabling faster vision transformers via latency-aware soft token pruning[C]//Europ-ean Conference on Computer Vision.Cham:Springer Nature Switzerland,2022:620-640. [11]SONG Z,XU Y,HE Z,et al.Cp-vit:Cascade vision transformer pruning via progressive sparsity prediction[J].arXiv:2203.04570,2022. [12]XU G,HAO J,SHEN L,et al.Lgvit:Dynamic early exiting for accelerating vision transformer[C]//Proceedings of the 31st ACM International Conference on Multimedia.2023:9103-9114. [13]LIU W,ZHOU P,ZHAO Z,et al.Fastbert:a self-distilling bert with adaptive inference time[J].arXiv:2004.02178,2020. [14]SCHUSTER T,FISCH A,GUPTA J,et al.Confident adaptive language modeling[J].Advances in Neural Information Proces-sing Systems,2022,35:17456-17472. [15]MA X,ZHOU A,ZHANG S,et al.Cooperative service caching and workload scheduling in mobile edge computing[C]//IEEE INFOCOM 2020—IEEE Conference on Computer Communications.IEEE,2020:2076-2085. [16]LIU Y,HE Q,ZHENG D,et al.Data caching optimization in the edge computing environment[J].IEEE Transactions on Services Computing,2020,15(4):2074-2085. [17]ZENG F,ZHANG K,WU L,et al.Efficient caching in vehicular edge computing based on edge-cloud collaboration[J].IEEE Transactions on Vehicular Technology,2022,72(2):2468-2481. [18]FAN W,GAO L,SU Y,et al.Joint DNN partition and resource allocation for task offloading in edge-cloud-assisted IoT environments[J].IEEE Internet of Things Journal,2023,10(12):10146-10159. [19]CHEN H,QIN W,WANG L.Task partitioning and offloading in IoT cloud-edge collaborative computing framework:a survey[J].Journal of Cloud Computing,2022,11(1):86. [20]LI X,QIN Y,ZHOU H,et al.An intelligent collaborative infe-rence approach of service partitioning and task offloading for deep learning based service in mobile edge computing networks[J].Transactions on Emerging Telecommunications Technologies,2021,32(9):e4263. [21]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[J].arXiv:1510.00149,2015. [22]OH Y H,QUAN Q,KIM D,et al.A portable,automatic data qantizer for deep neural networks[C]//Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques.2018:1-14. [23]REAGEN B,WHATMOUGH P,ADOLF R,et al.Minerva:Enabling low-power,highly-accurate deep neural network accelerators[J].ACM SIGARCH Computer Architecture News,2016,44(3):267-278. [24]YANG T J,CHEN Y H,SZE V.Designing energy-efficient convolutional neural networks using energy-aware pruning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5687-5695. [25]CHEN T,CHENG Y,GAN Z,et al.Chasing sparsity in vision transformers:An end-to-end exploration[J].Advances in Neural Information Processing Systems,2021,34:19974-19988. [26]LI Y,YU Y,ZHANG Q,et al.Losparse:Structured compres-sion of large language models based on low-rank and sparse approximation[C]//International Conference on Machine Lear-ning.PMLR,2023:20336-20350. [27]ASHKBOOS S,CROCI M L,NASCIMENTO M G,et al.Slice-gpt:Compress large language models by deleting rows and co-lumns[J].arXiv:2401.15024,2024. [28]AN Y,ZHAO X,YU T,et al.Fluctuation-based adaptive structured pruning for large language models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:10865-10873. [29]XU X,YAN K,HAN S,et al.Learning-based edge-device collaborative dnn inference in iovt networks[J].IEEE Internet of Things Journal,2023,11(5):7989-8004. |
| [1] | 李玉洁, 马子航, 王艺甫, 王星河, 谭本英. 视觉Transformer(ViT)发展综述 Survey of Vision Transformers(ViT) 计算机科学, 2025, 52(1): 194-209. https://doi.org/10.11896/jsjkx.240600135 |
| [2] | 郭文龙, 刘芳华, 吴万毅, 李冲, 肖鹏, 刘朝. 融合ViT卷积神经网络的木板表面缺陷识别 Wood Surface Defect Recognition Based on ViT Convolutional Neural Network 计算机科学, 2022, 49(11A): 211100090-6. https://doi.org/10.11896/jsjkx.211100090 |
| [3] | 李修云. 基于Activiti框架的在线审批流程应用研究 Format Description of Computer Applications and Software 计算机科学, 2016, 43(Z6): 555-557. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.132 |
| [4] | 韩习武,Roland Hausser. 基于扩展Viterbi路径的概率Earley算法 Probabilistic Earley Algorithm Based on Extended Viterbi Path 计算机科学, 2011, 38(1): 207-209. |
| [5] | . 基于非参数平滑的OFDM系统信道估计算法 计算机科学, 2009, 36(6): 53-56. |
| [6] | 李荣,郑家恒,郭梅英. 基于遗传算法的隐马尔可夫模型在名词短语识别中的应用研究 Application Study of Hidden Markov Model Based on Genetic Algorithm in Noun Phrase Identification 计算机科学, 2009, 36(10): 244-246. |
| [7] | . 基于类语言模型的中文机构名称自动识别 计算机科学, 2006, 33(11): 212-214. |
|
||