计算机科学 ›› 2026, Vol. 53 ›› Issue (4): 269-276.doi: 10.11896/jsjkx.250900024

• 计算机图形学&多媒体 • 上一篇    下一篇

LegoViT:边缘端视觉推理中ViT模型块粒度缩放技术

周豪捷1, 吴晓宁2, 高志强3, 韩锐1, 张青龙1, 刘驰1, 陈铮2, 赵玉2, 王硕2   

  1. 1 北京理工大学计算机学院 北京 100081
    2 中国民航信息网络股份有限公司 北京 101318
    3 中国人民武装警察部队工程大学 西安 710018
  • 收稿日期:2025-09-02 修回日期:2025-12-29 出版日期:2026-04-15 发布日期:2026-04-08
  • 通讯作者: 韩锐(hanrui@bit.edu.cn)
  • 作者简介:(3280165225@qq.com)
  • 基金资助:
    国家自然科学基金(62272046,62132019,62472033,61872337);工业和信息化部高质量发展专项(CEIEC-20240);北方自动控制技术研究所合作课题和北京理工大学培养课题(2023CX01017)

LegoViT:Block-grained Scaling Techniques for ViT Models in Edge-side Visual Inference

ZHOU Haojie1, WU Xiaoning2, GAO Zhiqiang3, HAN Rui1, ZHANG Qinglong1, LIU Chi1, CHEN Zheng2, ZHAO Yu2, WANG Shuo2   

  1. 1 School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
    2 TravelSky Technology Limited, Beijing 101318, China
    3 Engineering University of PAP, Xi’an 710018, China
  • Received:2025-09-02 Revised:2025-12-29 Published:2026-04-15 Online:2026-04-08
  • About author:ZHOU Haojie,born in 2002,postgra-duate,is a member of CCF(No.A05537G).His main research interest is edge intelligence.
    HAN Rui,born in 1985,assistant professor,Ph.D supervisor.His main research interests include cloud computing and edge intelligence.
  • Supported by:
    National Natural Science Foundation of China(62272046,62132019,62472033,61872337),Special Program for High-Quality Development of the Ministry of Industry and Information Technology(CEIEC-20240) and Cooperative Project with the Northern Institute of Automatic Control Technology and Cultivation Project of Beijing Institute of Technology(2023CX01017).

摘要: 近年来,ViT模型凭借其强大的图像理解能力被广泛部署于边缘侧视觉应用。在资源受限边缘端推理中,ViT模型需依据可用资源对其进行有效缩放来获取最优的推理精度-延迟平衡。然而,现有推理模型缩放技术往往仅能在整个模型粒度进行缩放,导致关键信息丢失,需消耗更多计算资源/推理延迟来获取同样的精度。对此,提出LegoViT方法,旨在从ViT模型前馈网络中识别出可缩放模型块,以支持运行时块粒度模型缩放。对比模型粒度缩放方法的测试结果表明,LegoViT使ViT模型内存占用降低22.37%,计算量减少21.1%,推理延迟平均缩短61.05%。

关键词: 边缘侧, ViT, 推理优化, 块粒度缩放

Abstract: In recent years,Visual Transformer (ViT) models have been widely deployed in edge-based visual applications because of their powerful image understanding capabilities.To achieve optimal inference accuracy-latency balance in resource-constrained edge-side inference,it is essential to scale ViT models effectively based on available resources.However,existing inference model scaling techniques can only perform scaling at the entire model granularity,leading to the loss of critical information and often requiring more computational resources or higher inference latency to achieve equivalent accuracy.This paper proposes LegoViT,a method that identifies scalable model blocks from the feedforward networks of ViT mo-dels,thus supporting runtime block-level model scaling.Comparative test results demonstrate that LegoViT achieves a 22.37% reduction in memory footprint of ViT models,a 21.1% decrease in computational overhead,and an average 61.05% reduction in inference latency.

Key words: Edge side, ViT, Inference optimization, Block-grained scaling

中图分类号: 

  • TP391
[1]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16×16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[2]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.
[3]FANG B,ZENG X,ZHANG M.Nestdnn:Resource-awaremulti-tenant on-device deep learning for continuous mobile vision[C]//Proceedings of the 24th Annual International Confe-rence on Mobile Computing and Networking.2018:115-127.
[4]HAN R,ZHANG Q,LIU C H,et al.Legodnn:block-grained scaling of deep neural networks for mobile vision[C]//Procee-dings of the 27th Annual International Conference on Mobile Computing and Networking.2021:406-419.
[5]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[J].arXiv:1510.00149,2015.
[6]LI H,HU C,JIANG J,et al.JALAD:Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution[C]//2018 IEEE 24th International Conference on Parallel and Distributed Systems(ICPADS).IEEE,2018:671-678.
[7]KIM Y D,PARK E,YOO S,et al.Compression of deep convolutional neural networks for fast and low power mobile applications[J].arXiv:1511.06530,2015.
[8]LI H,KADAV A,DURDANOVIC I,et al.Pruning filters for efficient convnets[J].arXiv:1608.08710,2016.
[9]TANG Q,ZHANG B,LIU J,et al.Dynamic token pruning inplain vision transformers for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:777-786.
[10]KONG Z,DONG P,MA X,et al.Spvit:Enabling faster vision transformers via latency-aware soft token pruning[C]//Europ-ean Conference on Computer Vision.Cham:Springer Nature Switzerland,2022:620-640.
[11]SONG Z,XU Y,HE Z,et al.Cp-vit:Cascade vision transformer pruning via progressive sparsity prediction[J].arXiv:2203.04570,2022.
[12]XU G,HAO J,SHEN L,et al.Lgvit:Dynamic early exiting for accelerating vision transformer[C]//Proceedings of the 31st ACM International Conference on Multimedia.2023:9103-9114.
[13]LIU W,ZHOU P,ZHAO Z,et al.Fastbert:a self-distilling bert with adaptive inference time[J].arXiv:2004.02178,2020.
[14]SCHUSTER T,FISCH A,GUPTA J,et al.Confident adaptive language modeling[J].Advances in Neural Information Proces-sing Systems,2022,35:17456-17472.
[15]MA X,ZHOU A,ZHANG S,et al.Cooperative service caching and workload scheduling in mobile edge computing[C]//IEEE INFOCOM 2020—IEEE Conference on Computer Communications.IEEE,2020:2076-2085.
[16]LIU Y,HE Q,ZHENG D,et al.Data caching optimization in the edge computing environment[J].IEEE Transactions on Services Computing,2020,15(4):2074-2085.
[17]ZENG F,ZHANG K,WU L,et al.Efficient caching in vehicular edge computing based on edge-cloud collaboration[J].IEEE Transactions on Vehicular Technology,2022,72(2):2468-2481.
[18]FAN W,GAO L,SU Y,et al.Joint DNN partition and resource allocation for task offloading in edge-cloud-assisted IoT environments[J].IEEE Internet of Things Journal,2023,10(12):10146-10159.
[19]CHEN H,QIN W,WANG L.Task partitioning and offloading in IoT cloud-edge collaborative computing framework:a survey[J].Journal of Cloud Computing,2022,11(1):86.
[20]LI X,QIN Y,ZHOU H,et al.An intelligent collaborative infe-rence approach of service partitioning and task offloading for deep learning based service in mobile edge computing networks[J].Transactions on Emerging Telecommunications Technologies,2021,32(9):e4263.
[21]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[J].arXiv:1510.00149,2015.
[22]OH Y H,QUAN Q,KIM D,et al.A portable,automatic data qantizer for deep neural networks[C]//Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques.2018:1-14.
[23]REAGEN B,WHATMOUGH P,ADOLF R,et al.Minerva:Enabling low-power,highly-accurate deep neural network accelerators[J].ACM SIGARCH Computer Architecture News,2016,44(3):267-278.
[24]YANG T J,CHEN Y H,SZE V.Designing energy-efficient convolutional neural networks using energy-aware pruning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5687-5695.
[25]CHEN T,CHENG Y,GAN Z,et al.Chasing sparsity in vision transformers:An end-to-end exploration[J].Advances in Neural Information Processing Systems,2021,34:19974-19988.
[26]LI Y,YU Y,ZHANG Q,et al.Losparse:Structured compres-sion of large language models based on low-rank and sparse approximation[C]//International Conference on Machine Lear-ning.PMLR,2023:20336-20350.
[27]ASHKBOOS S,CROCI M L,NASCIMENTO M G,et al.Slice-gpt:Compress large language models by deleting rows and co-lumns[J].arXiv:2401.15024,2024.
[28]AN Y,ZHAO X,YU T,et al.Fluctuation-based adaptive structured pruning for large language models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:10865-10873.
[29]XU X,YAN K,HAN S,et al.Learning-based edge-device collaborative dnn inference in iovt networks[J].IEEE Internet of Things Journal,2023,11(5):7989-8004.
[1] 李玉洁, 马子航, 王艺甫, 王星河, 谭本英.
视觉Transformer(ViT)发展综述
Survey of Vision Transformers(ViT)
计算机科学, 2025, 52(1): 194-209. https://doi.org/10.11896/jsjkx.240600135
[2] 郭文龙, 刘芳华, 吴万毅, 李冲, 肖鹏, 刘朝.
融合ViT卷积神经网络的木板表面缺陷识别
Wood Surface Defect Recognition Based on ViT Convolutional Neural Network
计算机科学, 2022, 49(11A): 211100090-6. https://doi.org/10.11896/jsjkx.211100090
[3] 李修云.
基于Activiti框架的在线审批流程应用研究
Format Description of Computer Applications and Software
计算机科学, 2016, 43(Z6): 555-557. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.132
[4] 韩习武,Roland Hausser.
基于扩展Viterbi路径的概率Earley算法
Probabilistic Earley Algorithm Based on Extended Viterbi Path
计算机科学, 2011, 38(1): 207-209.
[5] .
基于非参数平滑的OFDM系统信道估计算法

计算机科学, 2009, 36(6): 53-56.
[6] 李荣,郑家恒,郭梅英.
基于遗传算法的隐马尔可夫模型在名词短语识别中的应用研究
Application Study of Hidden Markov Model Based on Genetic Algorithm in Noun Phrase Identification
计算机科学, 2009, 36(10): 244-246.
[7] .
基于类语言模型的中文机构名称自动识别

计算机科学, 2006, 33(11): 212-214.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!