Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230600186-6.doi: 10.11896/jsjkx.230600186

• Image Processing & Multimedia Technolog • Previous Articles     Next Articles

Occluded Video Instance Segmentation Method Based on Feature Fusion of Tracking and Detection in Time Sequence

ZHENG Shenhai1,2, GAO Xi1, LIU Pengwei1, LI Weisheng1,2   

  1. 1 College of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
    2 Chongqing Key Laboratory of Image Cognition(Chongqing University of Posts and Telecommunications),Chongqing 400065,China
  • Published:2024-06-06
  • About author:ZHENG Shenhai,born in 1988,Ph.D,associate professor.His main research interests include machine learning,pattern recognition and medical image computing.
  • Supported by:
    National Natural Science Foundation of China(61902046),Science and Technology Research Program of Chongqing Municipal Education Commission(KJZD-K202200606) and Natural Science Foundation of Chongqing,China(2022NSCQ-MSX3746).

Abstract: Video instance segmentation is a visual task that has emerged in recent years,which introduces temporal characteristics on the basis of image instance segmentation.It aims to simultaneously segment objects in each frame and achieve inter frame object tracking.A large amount of video data has been generated with the rapid development of mobile Internet and artificial intelligence.However,due to shooting angles,rapid motion,and partial occlusion,objects in videos often split or blur,posing significant challenges in accurately segmenting targets from video data and processing and analyzing them.After consulting and practicing,it is found that existing video instance segmentation methods perform poorly in occluded situations.In response to the above issues,this paper proposes an improved occlusion video instance segmentation algorithm,which improves segmentation performance by integrating the temporal features of Transformer and tracking detection.To enhance the learning ability of the network for spatial position information,this algorithm introduces the time dimension into the Transformer network and considers the interdepen-dence and promotion relationship between object detection,tracking,and segmentation in videos.A fusion tracking module and a detection temporal feature module that can effectively aggregate the tracking offset of objects in videos are proposed,improving the performance of object segmentation in occluded environments.The effectiveness of the proposed method is verified through experiments on the OVIS and YouTube VIS datasets.Compared to the current benchmark method,the proposed method exhibits better segmentation accuracy,further demonstrating its superiority.

Key words: Video instance segmentation, Object detection, Object tracking, Feature in time sequence, Occluded instance

CLC Number: 

  • TP391
[1]QI J Y,GAO Y,HU Y,et al.Occluded video instance segmentation:A benchmark[J].International Journal of Computer Vision,2022,130(8):2022-2039.
[2]YANG L J,FAN Y C,XU N.Video instance segmentation[C]//International Conference on Computer Vision.2019:5188-5197.
[3]HE K M,GKIOXARI G,DOLLAR P,et al.Mask R-CNN[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:2961-2969.
[4]BERTASIUS G,TORRESANI L.Classifying,segmenting,andtracking object instances in video with mask propagation[C]//IEEE Conference on Computer Vision and Pattern Recognition.2020:9739-9748.
[5]DAI J F,QI H Z,XIONG Y W,et al.Deformable convolutional networks[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:764-773.
[6]ATHAR A,MAHADEVAN S,OSEP A,et al.Stem-seg:Spatio-temporal embeddings for instance segmentation in videos[C]//European Conference on Computer Vision.2020:158-177.
[7]FU Y,YANG L J,LIU D,et al.Complete:Comprehensive feature aggregation for video instance segmentation[C]//Confe-rence on Artificial Intelligence.2021,35(2):1361-1369.
[8]WANG Y Q,XU Z L,WANG X L,et al.End-to-end video instance segmentation with transformers[C]//IEEE Conference on Computer Vision and Pattern Recognition.2021:8741-8750.
[9]PARMAR N,VASWANI A,USZKOREIT J,et al.Image transformer[C]//International Conference on Machine Learning.2018:4055-4064.
[10]ZHU X Z,SU W J,LU L W,et al.Deformable DETR:Defor-mable transformers for end-to-end object detection[J].arXiv:2010.04159,2021.
[11]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16×16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2021.
[12]LI Z S,LIU X T,DRENKOW N,et al.Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers[C]//IEEE International Conference on Computer Vision.2021:6197-6206.
[13]LIU R J,YUAN Z J,LIU T,et al.End-to-end lane shape prediction with transformers[C]//IEEE Winter Conference on Applications of Computer Vision.2021:3694-3702.
[14]LIU CHANG,YUAN W J,WEI Z Q,et al.Location-aware predictive beamforming for UAV communications:A deep learning approach[J].IEEE Wireless Communications Letters,2020,10(3):668-672.
[15]ZHAO H S,JIA J Y,KOLTUN V.Exploring self-attention for image recognition[C]//IEEE International Conference on Computer Vision.2020:10076-10085.
[16]WANG H Y,ZHU Y K,GREEN B,et al.Axial-deeplab:Stand-alone axial-attention for panoptic segmentation[C]//European Conference on Computer Vision.2020:108-126.
[17]QI J Y,GAO Y,HU Y,et al.Occluded video instance segmentation:A benchmark[J].International Journal of Computer Vision,2022,130(8):2022-2039.
[18]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//IEEE International Conference on Computer Vision.2016:770-778.
[19]LOSHCHILOV I,HUTTER F.Fixing weight decay regularization in adam[C]//International Conference on Learning Representations.2018.
[20]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common objects in context[C]//European Conference on Computer Vision.2014:740-755.
[21]YANG S S,FANG Y X,WANG X G,et al.Crossover learning for fast online video instance segmentation[C]//IEEE International Conference on Computer Vision.2021:8043-8052.
[22]VOIGTLAENDER P,CHAI Y,SCHROFF F,et al.Feelvos:Fast end-to-end embedding learning for video object segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition.2019:9481-9490.
[23]BOCHINSKI E,EISELEIN V,SIKORA T.High-speed tra-cking-by-detection without using image information[C]//IEEE International Conference on Advanced Video and Signal Based Surveillance.2017:1-6.
[24]WU J L,CAO J L,SONG L C,et al.Track to detect and segment:An online multi-object tracker[C]//IEEE Conference on Computer Vision and Pattern Recognition.2021:12352-12361.
[25]CAO J L,ANWER R M,CHOLAKKAL H,et al.Sipmask:Spatial information preservation for fast image and video instance segmentation[C]//European Conference on Computer Vision.2020:1-18.
[1] SUN Ziwen, YUAN Guanglin, LI Congli, QIN Xiaoyan, ZHU Hong. Object Tracking of Structured SVM Based on DIoU Loss and Smoothness Constraints [J]. Computer Science, 2024, 51(6A): 230700113-8.
[2] LIU Hongli, WANG Yulin, SHAO Lei, LI Ji. Study on Monocular Vision Vehicle Ranging Based on Lower Edge of Detection Frame [J]. Computer Science, 2024, 51(6A): 231000077-6.
[3] CHEN Yuzhang, WANG Shiqi, ZHOU Wen, ZHOU Wanting. Small Object Detection for Fish Based on SPD-Conv and NAM Attention Module [J]. Computer Science, 2024, 51(6A): 230500176-7.
[4] QUE Yue, GAN Menghan, LIU Zhiwei. Object Detection with Receptive Field Expansion and Multi-branch Aggregation [J]. Computer Science, 2024, 51(6A): 230600151-6.
[5] JIAO Ruodan, GAO Donghui, HUANG Yanhua, LIU Shuo, DUAN Xuanfei, WANG Rui, LIU Weidong. Study and Verification on Few-shot Evaluation Methods for AI-based Quality Inspection in Production Lines [J]. Computer Science, 2024, 51(6A): 230700086-8.
[6] LIU Jiasen, HUANG Jun. Center Point Target Detection Algorithm Based on Improved Swin Transformer [J]. Computer Science, 2024, 51(6): 264-271.
[7] LI Yuehao, WANG Dengjiang, JIAN Haifang, WANG Hongchang, CHENG Qinghua. LiDAR-Radar Fusion Object Detection Algorithm Based on BEV Occupancy Prediction [J]. Computer Science, 2024, 51(6): 215-222.
[8] LIAO Junshuang, TAN Qinhong. DETR with Multi-granularity Spatial Attention and Spatial Prior Supervision [J]. Computer Science, 2024, 51(6): 239-246.
[9] BAI Xuefei, SHEN Wucheng, WANG Wenjian. Salient Object Detection Based on Feature Attention Purification [J]. Computer Science, 2024, 51(5): 125-133.
[10] WU Xiaoqin, ZHOU Wenjun, ZUO Chenglin, WANG Yifan, PENG Bo. Salient Object Detection Method Based on Multi-scale Visual Perception Feature Fusion [J]. Computer Science, 2024, 51(5): 143-150.
[11] JIAN Yingjie, YANG Wenxia, FANG Xi, HAN Huan. 3D Object Detection Based on Edge Convolution and Bottleneck Attention Module for Point Cloud [J]. Computer Science, 2024, 51(5): 162-171.
[12] XU Hao, LI Fengrun, LU Lu. Metal Surface Defect Detection Method Based on Dual-stream YOLOv4 [J]. Computer Science, 2024, 51(4): 209-216.
[13] LIU Zeyu, LIU Jianwei. Video and Image Salient Object Detection Based on Multi-task Learning [J]. Computer Science, 2024, 51(4): 217-228.
[14] HAO Ran, WANG Hongjun, LI Tianrui. Deep Neural Network Model for Transmission Line Defect Detection Based on Dual-branch Sequential Mixed Attention [J]. Computer Science, 2024, 51(3): 135-140.
[15] ZHANG Yang, XIA Ying. Object Detection Method with Multi-scale Feature Fusion for Remote Sensing Images [J]. Computer Science, 2024, 51(3): 165-173.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!