基于SAM的视觉应用综述

doi:10.11896/jsjkx.250600115

Abstract

Abstract: Segmentation Everything Model(SAM),as a generalized visual segmentation macromodel,brings new opportunities to the field of computer vision by virtue of its powerful zero-sample generalization ability and interactive segmentation capability.SAM builds up an efficient adaptation capability for cross-domain tasks by means of large-scale data training and a flexible cueing mechanism,and is able to quickly adapt to a wide range of vision tasks,such as medical image analysis and autonomous driving.SAM can be quickly adapted to a variety of vision tasks,such as medical image analysis and autonomous driving.In order to gain a deeper understanding of the performance bottlenecks and technical challenges of SAM in various vision applications,this paper explores its optimization path in vision tasks.Firstly,the core framework of SAM is introduced,and then,the improved models are classified and their applicable scenarios are analyzed.On this basis,the current research status of SAM in different visual tasks is sorted out and summarized,and experimental comparisons and demonstrations are made on the common datasets and evaluation metrics of each application scenario.Finally,the limitations and future development directions of SAM in different visual tasks are deeply analyzed and discussed.

Key words: SAM, Zero-shot generalization, Interaction segmentation, Visual task, Cross-domain tasks

CLC Number:

TP391

ZHANG Xu, WANG Anzhi, YANG Chenbang, WU Jintao. Review of SAM-based Vision Applications[J].Computer Science, 2026, 53(6A): 250600115-9.

References

[1] ZHANG C,ZHANG C,LI C,et al.One small step for generative ai,one giant leap for agi:A complete survey on chatgpt in aigc era[J].arXiv:2304.06488,2023.
[2] RADFORD A,KIM J W,HALLACY C,et al.Learning transfer-able visual models from natural language supervision[C]//International Conference on Machine Learning.PMLR,2021:8748-8763.
[3] REN C,WANG A,YANG C,et al.Frequency Domain-BasedCross-Layer Feature Aggregation Network for Camouflaged Object Detection[J].IEEE Signal Processing Letters,2025,32:2005-2009.
[4] MA J,HE Y,LI F,et al.Segment anything in medical images[J].Nature Communications,2024,15(1):654.
[5] KIRILLOV A,MINTUN E,RAVI N,et al.Segment anything[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:4015-4026.
[6] SOFIIUK K,PETROV I A,KONUSHIN A.Reviving iterative training with mask guidance for interactive segmentation[C]//2022 IEEE International Conference on Image Processing(ICIP).IEEE,2022:3141-3145.
[7] LIU Y,HU Q,LEI Y,et al.Box2seg:Learning semantics of 3d point clouds with box-level supervision[J].arXiv:2201.02963,2022.
[8] HEO Y,JUN KOH Y,KIMC S.Interactive video object seg-mentation using global and local transfer modules[C]//Computer Vision-ECCV 2020:16th European Conference,Glasgow,UK,August 23-28,2020,Proceedings,Part XVII 16.Springer International Publishing,2020:297-313.
[9] HU J,LIN J,GONG S,et al.Relax image-specific prompt re-quirement in sam:A single generic prompt for segmenting camouflaged objects[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:12511-12518.
[10] REN T,LIU S,ZENG A,et al.Grounded sam:Assemblingopen-world models for diverse visual tasks[J].arXiv:2401.14159,2024.
[11] ZHANG C,HAN D,QIAO Y,et al.Faster segment anything:Towards lightweight sam for mobile applications[J].arXiv:2306.14289,2023.
[12] ZHOU C,LI X,LOYC C,et al.Edgesam:Prompt-in-the-loopdistillation for on-device deployment of sam[J].arXiv:2312.06660,2023.
[13] MA J,HE Y,LI F,et al.Segment anything in medical images[J].Nature Communications,2024,15(1):654.
[14] XIE B,TANG H,DUAN B,et al.MaskSAM:Towards auto-prompt SAM with mask classification for medical image segmentation[J].arXiv:2403.14103,2024.
[15] XU X,CHEN H,ZHAO L,et al.Embodiedsam:Online segment any 3d thing in real time[J].arXiv:2408.11811,2024.
[16] ZHANG Y,CHENG T,ZHU L,et al.Evf-sam:Early vision-language fusion for text-prompted segment anything model[J].arXiv:2406.20076,2024.
[17] YANG X,DUAN S,WANG N,et al.Pro2SAM:Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization[C]//European Conference on Computer Vision.Cham:Springer Nature Switzerland,2024:387-403.
[18] LI Y,ZHANG J,TENG X,et al.Refsam:Efficiently adaptingsegmenting anything model for referring video object segmentation[J].arXiv:2307.00997,2023.
[19] WAHD A S,FELFELIYAN B,ZHOU Y,et al.Sam2Rad:A segmentation model for medical images with learnable prompts[J].Computers in Biology and Medicine,2025,187:109725.
[20] MURALI A,MASCAGNI P,MUTTER D,et al.Cyclesam:One-shot surgical scene segmentation using cycle-consistent feature matching to prompt sam[J].arXiv:2407.06795,2024.
[21] SHENG Y,BANO S,CLARKSON M J,et al.Surgical-DeSAM:decoupling SAM for instrument segmentation in robotic surgery[J].International Journal of Computer Assisted Radiology and Surgery,2024,19(7):1267-1271.
[22] LIU S,XU R.Multi-scale feature fusion based SAM for high-quality few-shot medical image segmentation[J].Computer Vision and Image Understanding,2025,258:104389.
[23] WAHD A S,KÜPPER J,JAREMKO J L,et al.Semantic AutoSAM:Self-Prompting Segment Anything Model for Semantic Segmentation of Medical Images[C]//2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society(EMBC).IEEE,2024:1-4.
[24] DENG R,CUI C,LIU Q,et al.Segment anything model(sam) for digital pathology:Assess zero-shot segmentation on whole slide imaging.arXiv 2023[J].arXiv:2304.04155.
[25] HU C,XIA T,JU S,et al.When sam meets medical images:An investigation of segment anything model(sam) on multi-phase liver tumor segmentation[J].arXiv:2304.08506,2023.
[26] ZHANG Y,LV B,XUE L,et al.SemiSAM+:Rethinking Semi-Supervised Medical Image Segmentation in the Era of Foundation Models[J].arXiv:2502.20749,2025.
[27] XIE B,TANG H,CAI D,et al.Self-Prompt SAM:MedicalImage Segmentation via Automatic Prompt SAM Adaptation[J].arXiv:2502.00630,2025.
[28] SONG K,CUI W,YU H,et al.SAMEra:Can It Segment Any Industrial Surface Defects?[J].Computers,Materials & Continua,2024,78(3).
[29] HAO J,LIU M,HUNGK F.GEM:Boost Simple Network forGlass Surface Segmentation via Segment Anything Model and Data Synthesis[J].arXiv:2401.15282,2024.
[30] CHEN Z,WONG W K,ZHONG Z,et al.Effective transfer of pretrained large visual model for fabric defect segmentation via specifc knowledge injection[J].arXiv:2306.16186,2023.
[31] MOENCK K,WENDT A,PRÜNTE P,et al.Industrial Segment Anything－a Case Study in Aircraft Manufacturing,Intralogistics,Maintenance,Repair,and Overhaul[J].arXiv:2307.12674,2023.
[32] SHAN X,ZHANG C.Robustness of segment anything model(sam) for autonomous driving in adverse weather conditions[J].arXiv:2306.13290,2023.
[33] ZHANG D,LIANG D,YANG H,et al.Sam3d:Zero-shot 3d object detection via segment anything model[J].arXiv:2306.02245,2023.
[34] SONG Z,ZHANG G,LIU L,et al.Robofusion:Towards robustmulti-modal 3d obiect detection via sam[J].arXiv:2401.03907,2024.
[35] ZHANG K,CHEN J,ZHANG R,et al.A Hybrid Approach for Efficient Traffic Sign Detection Using Yolov8 And SAM[C]//Proceedings of the 2024 3rd Asia Conference on Algorithms,Computing and Machine Learning.2024:298-302.
[36] ZHANG J,BAI C,HE H,et al.SAM-E:Leveraging VisualFoundation Model with Sequence Imitation for Embodied Manipulation[J].arXiv:2405.19586,2024.
[37] MENG H,CHEN L,ZHU S,et al.Zero-Shot Kidney Stone Segmentation Based on Segmentation Anything Model for Robotic-Assisted Endoscope Navigation[C]//International Conference on Intelligent Robotics and Applications.Singapore:Springer Nature Singapore,2023:80-90.
[38] SYUEN LIM J,LUO Y,CHEN Z,et al.Track Any Peppers:Weakly Supervised Sweet Pepper Tracking Using VLMs[J].arXiv e-prints,2024:arXiv:2411.06702.
[39] LI J,FENG Q,ZHANG J,et al.EMSAM:enhanced multi-scale segment anything model for leaf disease segmentation[J].Frontiers in Plant Science,2025,16:1564079.
[40] MOUPOJOU E,RETRAINT F,TAPAMO H,et al.SegmentAnything Model & Fully Convolutional Data Description for Plant Multi-disease Detection on Field Images[J].IEEE Access,2024,12:102592-102605.
[41] ZHANG W,DANG L M,NGUYEN L Q,et al.Adapting the segment anything model for plant recognition and automated phenotypic parameter measurement[J].Horticulturae,2024,10(4):398.
[42] LI Y,WANG D,YUAN C,et al.Enhancing agricultural image segmentation with an agricultural segment anything model adapter[J].Sensors,2023,23(18):7884.
[43] MA X,WU Q,ZHAO X,et al.SAM-Assisted Remote Sensing Imagery Semantic Segmentation With Object and Boundary Constraints[J].IEEE Transactions on Geoscience and Remote Sensing,2024,62:1-16.
[44] ZHANG P,YAN T,LIU Y,et al.Fantastic Animals and Where to Find Them:Segment Any Marine Animal with Dual SAM[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:2578-2587.
[45] DENG J,JIA Z,WANG Z,et al.Towards Unsupervised Eye-Region Segmentation for Eye Tracking[J].arXiv:2410.06131,2024.
[46] SONG E,OH D,OH B S.Visual Prompt Selection Framework for Real-Time Object Detection and Interactive Segmentation in Augmented Reality Applications[J].Applied Sciences,2024,14(22):10502
[47] CHAUDHARY K,SHAAR S,MUTHINTI R.Deep learning for fast segmentation and critical dimension metrology & characteri-zation enabling AR/VR design and fabrication[J].arXiv:2409.13951,2024.
[48] LI D,LU X.SegSdetr:harnessing supervisions in SAM with detection label for efficient smoke detection[C]//Jiangsu Annual Conference on Automation(JACA 2023).2023:65-70.
[49] LI S,YU C,CHANG L,et al.Railway Surrounding Environment Hazard Detection Based on Fast SAM[C]//International Conference on Electrical and Information Technologies for Rail Transportation.Singapore:Springer Nature Singapore,2023:644-656.
[50] CODELLA N,ROTEMBERG V,TSCHANDL P,et al.Skin lesion analysis toward melanoma detection 2018:A challenge hosted by the international skin imaging collaboration(isic)[J].arXiv:1902.03368,2019.
[51] ALLAN M,SHVETS A,KURMANN T,et al.2017 robotic instrument segmentation challenge[J].arXiv:1902.06426,2019.
[52] DICE L R.Measures of the amount of ecologic association between species[J].Ecology,1945,26(3):297-302.
[53] ALBARELLI A,RODOLA E,TORSELLO A.Loosely distinctive features for robust surface alignment[C]//Computer Vision-ECCV 2010:11th European Conference on Computer Vision,Heraklion,Crete,Greece,September 5-11,2010,Proceedings,Part V 11.Springer Berlin Heidelberg,2010:519-532.
[54] SONG G,SONG K,YAN Y.Saliency detection for strip steelsurface defects using multiple constraints and improved texture features[J].Optics and Lasers in Engineering,2020,128:106000.
[55] HUANG Y,QIU C,YUAN K.Surface defect saliency of magnetic tile[J].The Visual Computer,2020,36(1):85-96.
[56] WANG Z,BOVIK A C,SHEIKH H R,et al.Image quality assessment:from error visibility to structural similarity[J].IEEE transactions on image processing,2004,13(4):600-612.
[57] DEMPSTER A P,LAIRD N M,RUBIN D B.Maximum likelihood from incomplete data via the EM algorithm[J].Journal of the Royal Statistical Society:Series B(Methodological),1977,39(1):1-22.
[58] EVERINGHAM M,VAN GOOL L,WILLIAMSC K I,et al.The pascal visual object classes(voc) challenge[J].International Journal of Computer Vision,2010,88:303-338.
[59] ACHANTA R,HEMAMI S,ESTRADA F,et al.Frequency-tuned salient region detection[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:1597-1604.
[60] LI L,RIGALL E,DONG J,et al.MAS3K:An open dataset for marine animal segmentation[C]//International Symposium on Benchmarking,Measuring and Optimization.Cham:Springer International Publishing,2020:194-212.
[61] FU Z,CHEN R,HUANG Y,et al.Masnet:A robust deep marine animal segmentation network[J].IEEE Journal of Oceanic Engineering,2023.
[62] SIMONYAN K.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[63] FAN D P,GONG C,CAO Y,et al.Enhanced-alignment measure for binary foreground map evaluation[J].arXiv:1805.10421,2018.
[64] POWERS D M W.What the F-measure doesn’t measure:Features,Flaws,Fallacies and Fixes[J].arXiv:1503.06410,2015.
[65] RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//Medical image computing and computer-assisted intervention-MICCAI 2015:18th International Conference,Munich,Germany,October 5-9,2015,Proceedings,part III 18.Springer International Publishing,2015:234-241.
[66] FAN D P,JI G P,ZHOU T,et al.Pranet:Parallel reverse attention network for polyp segmentation[C]//International Conference on Medical Image Computing and Computer-assisted Intervention.Cham:Springer International Publishing,2020:263-273.
[67] JHA D,RIEGLER M A,JOHANSEN D,et al.Doubleu-net:A deep convolutional neural network for medical image segmentation[C]//2020 IEEE 33rd International Symposium on Compu-ter-based Medical Systems(CBMS).IEEE,2020:558-564.
[68] DONG B,WANG W,FAND P,et al.Polyp-pvt:Polyp segmentation with pyramid vision transformers.arXiv 2021[J].arXiv:2108.06932.
[69] TANG F,XU Z,HUANG Q,et al.DuAT:Dual-aggregationtransformer network for medical image segmentation[C]//Chinese Conference on Pattern Recognition and Computer Vision(PRCV).Singapore:Springer Nature Singapore,2023:343-356.
[70] IGLOVIKOV V,SHVETS A.Ternausnet:U-net with vgg11 encoder pre-trained on imagenet for image segmentation[J].arXiv:1801.05746,2018.
[71] CHAURASIA A,CULURCIELLO E.Linknet:Exploiting encoder representations for efficient semantic segmentation[C]//2017 IEEE Visual Communications and Image Processing(VCIP).IEEE,2017:1-4.
[72] LIU N,ZHANG N,WAN K,et al.Visual saliency transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:4722-4732.
[73] ZHUGE M,FAN D P,LIU N,et al.Salient object detection via integrity learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(3):3738-3752.
[74] WU Y H,LIU Y,ZHANG L,et al.EDN:Salient object detection via extremely-downsampled network[J].IEEE Transactions on Image Processing,2022,31:3125-3136.
[75] SONG G,SONG K,YAN Y.EDRNet:Encoder-decoder residual network for salient object detection of strip steel surface defects[J].IEEE Transactions on Instrumentation and Measurement,2020,69(12):9709-9719.
[76] ZHOU X,FANG H,LIU Z,et al.Dense attention-guided cascaded network for salient object detection of strip steel surface defects[J].IEEE Transactions on Instrumentation and Measurement,2021,71:1-14.
[77] DING T,LI G,LIU Z,et al.Cross-scale edge purification net-work for salient object detection of steel defect images[J].Measurement,2022,199:111429.
[78] MEI H,JI G P,WEI Z,et al.Camouflaged object segmentation with distraction mining[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8772-8781.
[79] LI L,DONG B,RIGALL E,et al.Marine animal segmentation[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(4):2303-2314.
[80] PANG Y,ZHAO X,XIANG T Z,et al.Zoom in and out:A mixed-scale triplet network for camouflaged object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:2160-2170.
[81] FU Z,CHEN R,HUANG Y,et al.Masnet:A robust deep marine animal segmentation network[J].IEEE Journal of Oceanic Engineering,2024:49(3):1104-1115.
[82] CHEN T,ZHU L,DING C,et al.SAM Fails to Segment Anything?--SAM-Adapter:Adapting SAM in Underperformed Scenes:Camouflage,Shadow,Medical Image Segmentation,and More[J].arXiv:2304.09148,2023.
[83] LAI Y,LUO Z,YU Z.Detect any deepfakes:Segment anything meets face forgery detection and localization[C]//Chinese Conference on Biometric Recognition.Singapore:Springer Nature Singapore,2023:180-190.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Review of SAM-based Vision Applications

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0

[1]	SUN Bo, WANG Zhijun, ZHOU Zhunan, LI Qingjie, WANG Yun, GENG Xia, ZHANG Yan , SUN Chenxuan. Imbalanced Data Learning Approach Utilizing Feature Value Based Class Overlap Degree [J]. Computer Science, 2026, 53(6A): 250600199-8.
[2]	ZHANG Xin, CHEN Wen. CausalVulGNN:Framework for Software Vulnerability Explanation Based on Causal Inferenceand Graph Neural Networks [J]. Computer Science, 2026, 53(6): 427-436.
[3]	HUANG Siyang, YAO Ye, ZHU Yian, HAI Duo, XIONG Zhihai. Anomaly Detection and Localization Technology for Gravity Wave Spectral Images Based onPre-trained Networks [J]. Computer Science, 2026, 53(5): 193-206.
[4]	GUO Jingchen, YANG Kuiwu, DING Mengdi, WEI Jianghong. Survey of Adversarial Sample Attacks for Vision Transformer [J]. Computer Science, 2026, 53(5): 404-418.
[5]	LI Jingwen, ZHANG Ru, LIU Gongshen, ZHANG Tong. Attack Capability Feature Learning and Aggregation Method Based on Semantic Co-occurrenceNetwork [J]. Computer Science, 2026, 53(5): 419-425.
[6]	WANG Liyan, ZHANG Qian, GUO Yuanyuan, CHEN Haifeng, LI Jian. Multimodal Continuous Emotion Recognition for English Spoken Emotion Evaluation [J]. Computer Science, 2026, 53(5): 99-108.
[7]	YIN Chuang, LIU Jianyi, ZHANG Ru. Cross-modal Fusion Few-sample Ransomware Classifier:Multimodal Encoding Based on Pre-trained Models [J]. Computer Science, 2026, 53(4): 435-444.
[8]	HUANG Beibei, LIU Jinfeng. Causal Disentangled Representation Learning with Integrated Sparse Coding [J]. Computer Science, 2026, 53(4): 66-77.
[9]	FU Yukai, LI Qingzhen, DONG Zhixue, SHI Dongli, ZHAO Peng. Pedestrian Re-identification Methods Based on Limited Target Data and Deep Learning [J]. Computer Science, 2026, 53(3): 287-294.
[10]	PAN Jian, WANG Xuhao. Time Series Forecasting Model Integrating Multi-scale Features and Attention Mechanism [J]. Computer Science, 2026, 53(2): 180-186.
[11]	ZHOU Qiang, LI Zhe, TAO Wei, TAO Qing. Adaptive Box-constraint Optimization Method for Adversarial Attacks [J]. Computer Science, 2026, 53(1): 404-412.
[12]	JI Liguang, YANG Hongru, ZHOU Yuchang, CUI Mengqi, HE Haotian, XU Jinchen. Maximum Error Parallel Detection Method Based on Locality Principle [J]. Computer Science, 2025, 52(9): 152-159.
[13]	HU Libin, ZHANG Yunfeng, LIU Peide. Synthetic Oversampling Method Based Noiseless Gradient Distribution [J]. Computer Science, 2025, 52(9): 220-231.
[14]	TANG Jiayi, HUANG Xiaofang, WANG Licheng, ODOOM J. Identity-based Linkable Ring Signcryption on NTRU Lattice [J]. Computer Science, 2025, 52(9): 396-404.
[15]	SUN Jingyu, HUANG He, SUN Yu'e, ZHANG Boyu. Super Spreader Detection Algorithm Based on Adaptive Sampling [J]. Computer Science, 2025, 52(8): 393-402.