计算机科学 ›› 2023, Vol. 50 ›› Issue (9): 227-234.doi: 10.11896/jsjkx.220700204

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于深度学习的红外视频显著性目标检测

朱叶, 郝应光, 王洪玉   

  1. 大连理工大学信息与通信工程学院 辽宁 大连 116024
  • 收稿日期:2022-07-24 修回日期:2022-11-08 出版日期:2023-09-15 发布日期:2023-09-01
  • 通讯作者: 郝应光(yghao@dlut.edu.cn)
  • 作者简介:(zhuye01020928@163.com)
  • 基金资助:
    中央高校基本科研业务费专项基金(DUT21GF204)

Deep Learning Based Salient Object Detection in Infrared Video

ZHU Ye, HAO Yingguang, WANG Hongyu   

  1. School of Information and Communication Engineering,Dalian University of Technology,Dalian,Liaoning 116024,China
  • Received:2022-07-24 Revised:2022-11-08 Online:2023-09-15 Published:2023-09-01
  • About author:ZHU Ye,born in 2000,postgraduate.Her main research interests include salient object detection in infrared videos and so on.
    HAO Yingguang,born in 1968,associate professor.His main research interests include modeling complex time-varying systems and image processing algorithm.
  • Supported by:
    Fundamental Research Funds for the Central Universities of Ministry of Education of China(DUT21GF204).

摘要: 面对背景越来越复杂的海量红外视频图像,传统方法的显著性目标检测性能不断下降。为了提升红外图像的显著性目标检测性能,提出了一种基于深度学习的红外视频显著性目标检测模型。该模型主要由空间特征提取模块、时间特征提取模块、残差连接块以及像素级分类器4个模块组成。首先利用空间特征提取模块获得空间特征,然后利用时间特征提取模块获得时间特征并实现时空一致性,最后将时空特征信息和由残差连接块连接空间模块获得的空间低层特征信息一同送入像素级分类器,生成最终的显著性目标检测结果。训练网络时,使用BCEloss和DICEloss两个损失函数结合的方式,以提高模型训练的稳定性。在红外视频数据集OTCBVS以及背景复杂的红外视频序列上进行测试,结果表明所提模型都能够获得准确的显著性目标检测结果,并且具有鲁棒性及较好的泛化能力。

关键词: 红外视频, 显著性目标检测, 深度学习, 卷积神经网络, 损失函数

Abstract: In the face of massive infrared video images with more and more complex background,the performance of the tradi-tional methods for salient object detection decreases significantly.In order to improve the performance of salient object detection in infrared images,this paper proposes a deep learning-based salient object detection model for infrared video,which mainly consists of a spatial feature extraction module,a temporal feature extraction module,a residual skip connection module and a pixel-wise classifier.First,the spatial feature extraction module is used to extract spatial saliency features from raw input video frames.Secondly,the temporal feature extraction module is used to obtain temporal saliency features and spatio-temporal coherence mo-deling.Finally,the spatial-temporal feature information and the spatial low-level feature information obtained by connecting the spatial module with the residual skip connection layer are sent into the pixel-wise classifier to generate the final salient object detection results.To improve the stability of the model,BCEloss and DICEloss are combined to train the network.The test is carried out on infrared video dataset OTCBVS and infrared video sequences with complex background.The proposed model can obtain accurate salient object detection results,and has robustness and good generalization ability.

Key words: Infrared video, Salient object detection, Deep learning, Convolutional neural network, Loss function

中图分类号: 

  • TP751
[1]ZHANG B H,JIAO D D,PEI H Q,et al.Infrared moving object detection based on local saliency and sparserepresentation[J].Infrared Physics & Technology,2017,86(12):187-193.
[2]ZHAO J,FENG C,SHAO F Q,et al.Moving object detection and segmentation based on adaptive frame difference and level set [J].Information and Control,2012,41(2):153-158.
[3]LEE M,CHO S,LEE S,et al.Unsupervised Video Object Segmentation via Prototype Memory Network[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2023:5924-5934.
[4]AHN D,KIM S,HONG H,et al.STAR-Transformer:A Spatio-temporal Cross Attention Transformer for Human Action Re-cognition[C]//Proceedings of the IEEE/CVF Winter Confe-rence on Applications of Computer Vision.2023:3330-3339.
[5]ZHOU F,KANG S B,COHEN M F.Time-Mapping UsingSpace-Time Saliency[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:3358-3365.
[6]HOU X D,ZHANG L Q.Saliency detection:A spectral residual approach[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2007:1-8.
[7]ACHANTA R,HEMAMI S,ESTRADA F,et al.Frequency-tuned salient region detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2009:1597-1604.
[8]CHENG M M,MITRA N J,HUANG X L,et al.Global Contrast Based Salient Region Detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,37(3):569-572.
[9]RAHTU E,HEIKKILA J.A simple and efficient saliency detector for background subtraction [C]//Proceedings Eighth IEEE International Conferenceon Computer Vision(ICCV 2009).IEEE,2009:1137-1144.
[10]HAN J H,MA Y,ZHOU B,et al.A robust infrared small target detection algorithm based on human visual system[J].IEEE Geo-science and Remote Sensing Letters,2014,11(12):2168-2172.
[11]WANG W,SHEN J,SHAO L.Video salient object detection via fully convolutional networks[J].IEEE Transactions on Image Processing,2018,27(1):38-49.
[12]SIMONYAN K,ZISSERMAN A.Two-Stream ConvolutionalNetworks for Action Recognition in Videos[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.2014:568-576.
[13]LI H F,CHEN G Q,LI G B,et al.Motion guided attention for video salient object detection[C]//Proceedings Eighth IEEE International Conference on Computer Vision(ICCV 2019).IEEE,2019:7273-7282.
[14]FAN D P,WANG W,CHENG M M,et al.Shifting More Atten-tion to Video Salient Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:8546-8556.
[15]LI G B,XIE Y,WEI T H,et al.Flow Guided Recurrent Neural Encoder for Video Salient Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:3243-3252.
[16]HE K M,ZHANG X Y,REN S H,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[17]WOO S,PARK J,LEE J Y,et al.CBAM:convolutional block at-tention module[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19.
[18]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.DeepLab:Semantic Image Segmentation with Deep Convolutional Nets,Atrous Convolution,and Fully Connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,40(4):834-848.
[19]WANG X,GIRSHICK R,GUPTA A,et al.Non-local neural networks[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2018:7794-7803.
[20]BALLAS N,YAO L,PAL C,et al.Delving deeper into convolutional networks for learning video representations[J].arXiv:2016.06432,2022.
[21]KYUNGHYUN C,BART V,CAGLAR G,et al.Learningphrase representations usingrnn encoder-decoder for statistical machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.2014:1724-1734.
[22]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[23]SONG H M,WANG W G,ZHAO S Y,et al.Pyramid dilated deeper convlstm for video salient object detection[C]//Procee-dings of the European Conference on Computer Vision(ECCV).2018:715-731.
[24]BUADES A,COLL B,MOREL M.A non-local algorithm for image denoising[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2005:60-65.
[25]HE K M,ZHANG XY,REN S H,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-780.
[26]FEDERICO P,PONT-TUSET J,MCWILLIAMS B,et al.Abenchmark dataset and evaluation methodology for video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:724-732.
[27]LI J,CHEN X.A benchmark dataset and saliency-guidedstacked autoencoders for video based salient object detection[J].IEEE Transactions on Image Processing,2018,27(1):349-364.
[28]PERAZZI F,KRÄHENBÜHL P,PRITCH Y,et al.Saliency filters:Contrast based filtering for salient region detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2012:733-740.
[29]MARGOLIN R,ZELNIK-MANOR L,TAL A.How to evaluate foreground maps[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:248-255.
[30]FAN D P,CHENG M M,CHENG G,et al.Enhanced-alignment measure for binary foreground map evaluation[C]//Interna-tional Joint Conferences on Artificial Intelligence.2018:698-704.
[31]FAN D P,CHENG M M,LIU Y,et al.Structure-measure:A new way to evaluate foreground maps[C]//Proceedings Eighth IEEE International Conference on Computer Vision(ICCV 2017).IEEE,2017:4558-4567.
[32]WU Z,SU L,HUANG Q.Cascaded partial decoder for fast and accurate salient object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:3902-3911.
[33]YAN P X,LI G B,XIE Y,et al.Semi-supervised video salient object detection using pseudo-labels[C]//Proceedings Eighth IEEE International Conference on Computer Vision(ICCV 2019).IEEE,2019:7283-7292.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!