计算机科学 ›› 2026, Vol. 53 ›› Issue (2): 227-235.doi: 10.11896/jsjkx.241200082
黄靖1,2, 王腾1, 刘健1, 胡凯1, 彭鑫1, 黄亚敏3,4, 文元桥3,4
HUANG Jing1,2, WANG Teng1, LIU Jian1, HU Kai1, PENG Xin1, HUANG Yamin3,4, WEN Yuanqiao3,4
摘要: 由于水声图像数据不足,水声目标的监督信息过少,现有的目标检测算法难以直接使用。为了解决此问题,在DETR(End-to-End Object Detection with Transformers)的基础上,提出了一种基于开集的水声图像目标检测方法USD(Underwater Sonar Detection)。首先,在跨模态特征融合编码模块中,使用多尺度可变形注意力机制对图像特征单独迭代,帮助网络有选择性地自动关注重要信息,减少计算量,同时采用多头自注意力机制迭代文本特征,提高模型对序列的全局建模能力;然后,使用双向注意力机制融合文本与图像特征,关注输入序列中的双向关系,使网络学习到更复杂的文本图像关系;最后,在图像文本特征解码模块中,使用Encoder模块输出的图像特征初始化query,在训练时使用DN(DeNoising)方法解决模型收敛慢的问题。实验表明,所提方法在自制的水声图像数据集上的平均检测精度达到77.5%,与其他检测方法相比具有更高的精度,同时实现了开集目标检测,具有良好的检测性能。
中图分类号:
| [1]GU Y S,JIANG Q P,SHAO F,et al. A quality evaluation dataset for real underwater image enhancement[J].Journal of Image and Graphics,2022,27(5):1467-1480. [2]CHEN L,DING D D.Underwater image enhancement based on multi-residual joint learning[J].Journal of Image and Graphics,2022,27(5):1577-1588. [3]GUO J C,YUE H H,ZHANG Y,et al.A study on the impact of image enhancement on salient object detection[J].Journal of Image and Graphics,2022,27(7):2129-2147. [4]WANG K Y,HUANG S R,LI Y S.Research progress on underwater optical image reconstruction methods[J].Journal of Image and Graphics,2022,27(5):1337-1358. [5]QIAN X Q,LIU W F,ZHANG J,et al.Degradation feature enhancement algorithm for underwater image object detection[J].Journal of Image and Graphics,2022,27(11):3185-3198. [6]LIANG X M,LI R,YU H F,et al.Improved YOLOv7 algorithm for underwater object detection[J].Computer Engineering and Applications,2024,60(6):89-99. [7]YAN X H.Research on underwater object detection methodbased on deep learning[D].Harbin:Harbin Engineering University,2021. [8]CHEN X L.Deep learning-based underwater litter detection[D].Guizhou:Guizhou Normal University,2023. [9]YU Y,ZHAO J,GONG Q,et al.Real-Time Underwater Maritime Object Detection in Side-Scan Sonar Images Based on Transformer-YOLOv5[J].Remote Sensing,2021,13(18):3555. [10]GUO Y L.Research on deep learning-based underwater sonar image object detection method[D].Jinan:Shandong Jiaotong University,2023. [11]LIANG H,JIN L L,YANG C S.Research on underwater object recognition based on deep learning under small sample conditions[J].Journal of Wuhan University of Technology(Transportation Science & Engineering Edition),2019,43(1):6-10. [12]VARGHVARGHESE R,SAMBATH M.YOLOv8:A NovelObject Detection Algorithm with Enhanced Performance and Robustness[C]//Proceedings of IEEE International Conference on Advances in Data Engineering and Intelligent Computing Systems.New York:IEEE Press,2024:1-6. [13]FENG J J,LI B,TIAN L F,et al.Semi-supervised surface object detection based on multi-view cross-consistency learning[J].Journal of Harbin Institute of Technology,2023,55(4):107-114. [14]AMIN R A,HASAN M,WIESE V,et al.FPGA-Based Real-time Object Detection and Classification System Using YOLO For Edge Computing[J].IEEE Access,2024,12:73268-73278. [15]WANG C Y,BOCHKOVSKIY A,LIAO H Y M.YOLOv7:Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors[C]//Proceedings of IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.New York:IEEE Press,2023:7464-7475. [16]LUO F,LI J W,HE D S.Ship object detection based on scale-adaptive receptive field[J].Application Research of Computers,2024,41(8):2521-2527. [17]LONG Y,WEN Y,HAN J,et al.CapDet:Unifying Dense Captioning and Open-World Detection Pretraining[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2023:15233-15243. [18]SCHEIRER W J,DE REZENDE A,SAPKOTA A,et al.TowardOpen Set Recognition[J].IEEE Transactions on Pattern Ana-lysis and Machine Intelligence,2012,35(7):1757-1772. [19]RADFORD A,KIM J W,HALLACY C,et al.Learning Transferable Visual Models From Natural Language Supervision[C]//Proceedings of the 38th International Conference on Machine Learning.Virtual:PMLR,2021:8748-8763. [20]GU X,LIN T Y,KUO W,et al.Open-Vocabulary Object Detection Via Vision and Language Knowledge Distillation[J].arXiv:2104.13921,2021. [21]ZHONG Y,YANG J,ZHANG P,et al.RegionCLIP:Region-Based Language-Image Pretraining[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans:IEEE Press,2022:16793-16803. [22]MINDERER M,GRITSENKO A,STONE A,et al.SimpleOpen-Vocabulary Object Detection[C]//Proceedings of European Conference on Computer Vision.Cham:Springer Nature Switzerland,2022:728-755. [23]YAO L,HAN J,WEN Y,et al.DetCLIP:Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection[J].Advances in Neural Information Processing Systems,2022,35:9125-9138. [24]YAO L,HAN J,LIANG X,et al.DetCLIPv2:Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Vancouver:IEEE Press,2023:23497-23506. [25]KENTHAPADI K,SAMEKI M,TALY A.Grounding andEvaluation for Large Language Models:Practical Challenges and Lessons Learned(survey)[C]//Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2024:6523-6533. [26]LI Z,XU Q,ZHANG D,et al.Groundinggpt:Language En-hanced Multi-modal Grounding Model[C]//Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).Bangkok:Association for Computational Linguistics,2024:6657-6678. [27]LI L H,ZHANG P,ZHANG H,et al.Grounded language-image pre-training[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2022:10965-10975. [28]LIU S,ZENG Z,REN T,et al.Grounding DINO:Marrying DINO with Grounded Pre-training for Open-Set Object Detection[C]//Proceedings of the European Conference on Computer Vision.Cham:Springer,2025:38-55. [29]CARION N,MASSA F,SYNNAEVE G,et al.End-to-End Object Detection with Transformers[C]//Proceedings of the European Conference on Computer Vision.Cham:Springer International Publishing,2020:213-229. [30]MENG D,CHEN X,FAN Z,et al.Conditional DETR for Fast Training Convergence[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Montreal:IEEE Press,2021:3651-3660. [31]LI F,ZHANG H,LIU S,et al.DN-DETR:Accelerate DETRTraining by Introducing Query DeNoising[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans:IEEE Press,2022:13619-13627. [32]WANG Y,ZHANG X,YANG T,et al.Anchor DETR:QueryDesign for Transformer-Based Object Detection[J].arXiv:2109.07107,2021. [33]ZHANG H,LI F,LIU S,et al.Dino:Detr withImproved Denoi-sing Anchor Boxes for End-to-End Object Detection[J].arXiv:2203.03605,2022. [34]REDMON J.You Only Look Once:Unified,Real-time ObjectDetection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016. [35]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies(NAACL-HLT).Minneapolis:Association for Computational Linguistics,2019:1-2. [36]LU J,BATRA D,PARIKH D,et al.VILBERT:PretrainingTask-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks[C]//Proceedings og the 33rd International Conference on Neural Information Processing Systems.2019,13-23. [37]ZHU X,SU W,LU L,et al.Deformable Detr:DeformableTransformers for End-to-End Object Detection[J].arXiv:2010.04159,2020. [38]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towardsreal-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149. [39]CHEN Q,CHEN X,WANG J,et al.Group DETR:Fast DETR Training with Group-Wise One-to-Many Assignment[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Paris:IEEE Press,2023:6633-6642. |
|
||