LegoViT:边缘端视觉推理中ViT模型块粒度缩放技术

doi:10.11896/jsjkx.250900024

Computer Science ›› 2026, Vol. 53 ›› Issue (4): 269-276.doi: 10.11896/jsjkx.250900024

• Computer Graphics & Multimedia • Previous Articles Next Articles

LegoViT:Block-grained Scaling Techniques for ViT Models in Edge-side Visual Inference

ZHOU Haojie¹, WU Xiaoning², GAO Zhiqiang³, HAN Rui¹, ZHANG Qinglong¹, LIU Chi¹, CHEN Zheng², ZHAO Yu², WANG Shuo²

1 School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
2 TravelSky Technology Limited, Beijing 101318, China
3 Engineering University of PAP, Xi’an 710018, China

Received:2025-09-02 Revised:2025-12-29 Online:2026-04-15 Published:2026-04-08
About author:ZHOU Haojie,born in 2002,postgra-duate,is a member of CCF(No.A05537G).His main research interest is edge intelligence.
HAN Rui,born in 1985,assistant professor,Ph.D supervisor.His main research interests include cloud computing and edge intelligence.
Supported by:
National Natural Science Foundation of China(62272046,62132019,62472033,61872337),Special Program for High-Quality Development of the Ministry of Industry and Information Technology(CEIEC-20240) and Cooperative Project with the Northern Institute of Automatic Control Technology and Cultivation Project of Beijing Institute of Technology(2023CX01017).

Abstract

Abstract: In recent years,Visual Transformer (ViT) models have been widely deployed in edge-based visual applications because of their powerful image understanding capabilities.To achieve optimal inference accuracy-latency balance in resource-constrained edge-side inference,it is essential to scale ViT models effectively based on available resources.However,existing inference model scaling techniques can only perform scaling at the entire model granularity,leading to the loss of critical information and often requiring more computational resources or higher inference latency to achieve equivalent accuracy.This paper proposes LegoViT,a method that identifies scalable model blocks from the feedforward networks of ViT mo-dels,thus supporting runtime block-level model scaling.Comparative test results demonstrate that LegoViT achieves a 22.37% reduction in memory footprint of ViT models,a 21.1% decrease in computational overhead,and an average 61.05% reduction in inference latency.

Key words: Edge side, ViT, Inference optimization, Block-grained scaling

CLC Number:

TP391

ZHOU Haojie, WU Xiaoning, GAO Zhiqiang, HAN Rui, ZHANG Qinglong, LIU Chi, CHEN Zheng, ZHAO Yu, WANG Shuo. LegoViT:Block-grained Scaling Techniques for ViT Models in Edge-side Visual Inference[J].Computer Science, 2026, 53(4): 269-276.

References

[1]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16×16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[2]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.
[3]FANG B,ZENG X,ZHANG M.Nestdnn:Resource-awaremulti-tenant on-device deep learning for continuous mobile vision[C]//Proceedings of the 24th Annual International Confe-rence on Mobile Computing and Networking.2018:115-127.
[4]HAN R,ZHANG Q,LIU C H,et al.Legodnn:block-grained scaling of deep neural networks for mobile vision[C]//Procee-dings of the 27th Annual International Conference on Mobile Computing and Networking.2021:406-419.
[5]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[J].arXiv:1510.00149,2015.
[6]LI H,HU C,JIANG J,et al.JALAD:Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution[C]//2018 IEEE 24th International Conference on Parallel and Distributed Systems(ICPADS).IEEE,2018:671-678.
[7]KIM Y D,PARK E,YOO S,et al.Compression of deep convolutional neural networks for fast and low power mobile applications[J].arXiv:1511.06530,2015.
[8]LI H,KADAV A,DURDANOVIC I,et al.Pruning filters for efficient convnets[J].arXiv:1608.08710,2016.
[9]TANG Q,ZHANG B,LIU J,et al.Dynamic token pruning inplain vision transformers for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:777-786.
[10]KONG Z,DONG P,MA X,et al.Spvit:Enabling faster vision transformers via latency-aware soft token pruning[C]//Europ-ean Conference on Computer Vision.Cham:Springer Nature Switzerland,2022:620-640.
[11]SONG Z,XU Y,HE Z,et al.Cp-vit:Cascade vision transformer pruning via progressive sparsity prediction[J].arXiv:2203.04570,2022.
[12]XU G,HAO J,SHEN L,et al.Lgvit:Dynamic early exiting for accelerating vision transformer[C]//Proceedings of the 31st ACM International Conference on Multimedia.2023:9103-9114.
[13]LIU W,ZHOU P,ZHAO Z,et al.Fastbert:a self-distilling bert with adaptive inference time[J].arXiv:2004.02178,2020.
[14]SCHUSTER T,FISCH A,GUPTA J,et al.Confident adaptive language modeling[J].Advances in Neural Information Proces-sing Systems,2022,35:17456-17472.
[15]MA X,ZHOU A,ZHANG S,et al.Cooperative service caching and workload scheduling in mobile edge computing[C]//IEEE INFOCOM 2020—IEEE Conference on Computer Communications.IEEE,2020:2076-2085.
[16]LIU Y,HE Q,ZHENG D,et al.Data caching optimization in the edge computing environment[J].IEEE Transactions on Services Computing,2020,15(4):2074-2085.
[17]ZENG F,ZHANG K,WU L,et al.Efficient caching in vehicular edge computing based on edge-cloud collaboration[J].IEEE Transactions on Vehicular Technology,2022,72(2):2468-2481.
[18]FAN W,GAO L,SU Y,et al.Joint DNN partition and resource allocation for task offloading in edge-cloud-assisted IoT environments[J].IEEE Internet of Things Journal,2023,10(12):10146-10159.
[19]CHEN H,QIN W,WANG L.Task partitioning and offloading in IoT cloud-edge collaborative computing framework:a survey[J].Journal of Cloud Computing,2022,11(1):86.
[20]LI X,QIN Y,ZHOU H,et al.An intelligent collaborative infe-rence approach of service partitioning and task offloading for deep learning based service in mobile edge computing networks[J].Transactions on Emerging Telecommunications Technologies,2021,32(9):e4263.
[21]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[J].arXiv:1510.00149,2015.
[22]OH Y H,QUAN Q,KIM D,et al.A portable,automatic data qantizer for deep neural networks[C]//Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques.2018:1-14.
[23]REAGEN B,WHATMOUGH P,ADOLF R,et al.Minerva:Enabling low-power,highly-accurate deep neural network accelerators[J].ACM SIGARCH Computer Architecture News,2016,44(3):267-278.
[24]YANG T J,CHEN Y H,SZE V.Designing energy-efficient convolutional neural networks using energy-aware pruning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5687-5695.
[25]CHEN T,CHENG Y,GAN Z,et al.Chasing sparsity in vision transformers:An end-to-end exploration[J].Advances in Neural Information Processing Systems,2021,34:19974-19988.
[26]LI Y,YU Y,ZHANG Q,et al.Losparse:Structured compres-sion of large language models based on low-rank and sparse approximation[C]//International Conference on Machine Lear-ning.PMLR,2023:20336-20350.
[27]ASHKBOOS S,CROCI M L,NASCIMENTO M G,et al.Slice-gpt:Compress large language models by deleting rows and co-lumns[J].arXiv:2401.15024,2024.
[28]AN Y,ZHAO X,YU T,et al.Fluctuation-based adaptive structured pruning for large language models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:10865-10873.
[29]XU X,YAN K,HAN S,et al.Learning-based edge-device collaborative dnn inference in iovt networks[J].IEEE Internet of Things Journal,2023,11(5):7989-8004.

Related Articles 15

[1]	ZHANG Xinfan, CHENG Baolei, FAN Jianxi, WANG Yan. Connectivity and Diagnosability of Data Center Network SWCube [J]. Computer Science, 2026, 53(3): 400-410.
[2]	ZHANG Huan, HOU Mingxing, LIU Guangna , SHI Ying. Fine-grained Access Control Model for Big Data Based on Dynamic Data Sensitivity Levels [J]. Computer Science, 2026, 53(2): 187-195.
[3]	ZHOU Qiang, LI Zhe, TAO Wei, TAO Qing. Adaptive Box-constraint Optimization Method for Adversarial Attacks [J]. Computer Science, 2026, 53(1): 404-412.
[4]	PENG Jiao, HE Yue, SHANG Xiaoran, HU Saier, ZHANG Bo, CHANG Yongjuan, OU Zhonghong, LU Yanyan, JIANG dan, LIU Yaduo. Text-Dynamic Image Cross-modal Retrieval Algorithm Based on Progressive Prototype Matching [J]. Computer Science, 2025, 52(9): 276-281.
[5]	FANG Chunying, HE Yuankun, WU Anxin. Emotion Recognition Based on Brain Network Connectivity and EEG Microstates [J]. Computer Science, 2025, 52(7): 201-209.
[6]	JIN Jiaobo, ZHU Tiantian. Circuit Module Reliability Calculation Method for Multi-target Tracking [J]. Computer Science, 2025, 52(6A): 240800094-6.
[7]	MA Yingjie, TIAN Yan, ZHAO Geng, YANG Yatao, QIN Jingying, HONG Hui. Two-dimensional Dynamic Coupled Mapping Lattice System Based on m-sequence and Spherical Cavity and Its Characteristics [J]. Computer Science, 2025, 52(4): 381-391.
[8]	WANG Kangyue, CHENG Ming, XIE Yixiang, ZOU Xiaobing, LI Ming. Role-aware Speaker Diarization in Autism Interview Scenarios [J]. Computer Science, 2025, 52(2): 231-241.
[9]	WANG Jianbo, LUO Yu, XU Xiaoke, DU Zhanwei, LI Ping. Identifying Influential Nodes in Multilayer Networks Based on Layer Weighting and Gravity Centrality [J]. Computer Science, 2025, 52(12): 92-101.
[10]	ZHANG Weijing, GAO Yanping. Analysis of Opinion Dynamics Based on Sensitivity to Opinion Disparity and Trust in Opinion Leaders [J]. Computer Science, 2025, 52(11A): 250100007-9.
[11]	HU Yongqing, YANG Han, LIU Ziyuan, QING Guangjun, DAI Qinglong. ACCF:Time Prediction Mechanism-driven Top-k Flow Measurement [J]. Computer Science, 2025, 52(10): 98-105.
[12]	LI Yujie, MA Zihang, WANG Yifu, WANG Xinghe, TAN Benying. Survey of Vision Transformers(ViT) [J]. Computer Science, 2025, 52(1): 194-209.
[13]	SUN Haowen, DING Jiaman, LI Bowen, JIA Lianyin. Clustering Algorithm Based on Attribute Similarity and Distributed Structure Connectivity [J]. Computer Science, 2024, 51(7): 124-132.
[14]	LI Jie, WANG Yao, CHEN Kansong, XU Lijun. Adaptive Sparse Sensor Network Target Coverage Algorithm Based on Edge Computing [J]. Computer Science, 2024, 51(6): 364-374.
[15]	LI Panpan, WU Hao, LIU Jiajia, DUAN Li, LU Yunlong. Overview of Security Technologies and Strategies for Intelligent Railway 5G [J]. Computer Science, 2024, 51(5): 1-11.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!