计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 194-199.doi: 10.11896/jsjkx.220700145

• 计算机图形学&多媒体 • 上一篇    下一篇

PSwin:基于Swin Transformer的边缘检测算法

胡名扬1,2, 郭燕1,2, 金杨爽2   

  1. 1 中国科学技术大学苏州高等研究院 江苏 苏州 215123
    2 中国科学技术大学软件学院 江苏 苏州 215123
  • 收稿日期:2022-07-15 修回日期:2022-10-27 出版日期:2023-06-15 发布日期:2023-06-06
  • 通讯作者: 郭燕(guoyan@ustc.edu.cn)
  • 作者简介:(myoung@mail.ustc.edu.cn)

PSwin:Edge Detection Algorithm Based on Swin Transformer

HU Mingyang1,2, GUO Yan1,2, JIN Yangshuang2   

  1. 1 Suzhou Institute for Advanced Research,University of Science and Technology of China,Suzhou,Jiangsu 215123,China
    2 School of Software Engineering,University of Science and Technology of China,Suzhou,Jiangsu 215123,China
  • Received:2022-07-15 Revised:2022-10-27 Online:2023-06-15 Published:2023-06-06
  • About author:HU Mingyang,born in 1997,master.His main research interests include computer vision and natural language processing.GUO Yan,born in 1981,lecturer.Her main research interests include information security,blockchain and NLP.

摘要: 边缘检测作为一种传统的计算机视觉算法,已经被广泛应用于车牌识别、光学字符识别等现实场景。当边缘检测作为更高层级算法的基础时,比如目标检测、语义分割等算法,又可以应用于城市安防、自动驾驶等领域。好的边缘检测算法能够有效提升上述计算机视觉任务的效率和准确度。边缘提取任务的难点在于目标的大小以及边缘细节的差异性,因此边缘提取算法需能够有效处理不同尺度的边缘。PSwin首次将Transformer应用于边缘提取任务,并提出了一种新型特征金字塔网络,以充分利用骨干网络多尺度和多层次的特征。PSwin使用自注意力机制,相比卷积神经网络架构,可以更有效地提取图像中的全局结构信息。在BSDS500数据集上进行评估时,PSwin边缘检测算法达到了最佳水平,ODS F-measure 为0.826,OIS为0.841。

关键词: 边缘检测, 特征金字塔, 视觉注意力, 迁移学习, BSDS500

Abstract: As a traditional computer vision algorithm,edge detection has been widely used in real-world scenarios such as license plate recognition and optical character recognition.When edge detection is used as the basis for higher-level algorithms,such as target detection,semantic segmentation and other algorithms.Edge detection can also be applied to urban security,autonomous driving and other fields.A good edge detection algorithm can effectively improve the efficiency and accuracy of the above compu-ter vision tasks.The difficulty of the edge extraction task lies in the size of the target and the difference of edge details,so the edge extraction algorithm needs to be able to effectively deal with edges of different scales.In this paper,the Transformer is applied to the edge extraction task for the first time,and a novel feature pyramid network is proposed to make full use of the multi-scale and multi-level features of the backbone network.PSwin uses a self-attention mechanism,which can extract global structural information in images more efficiently than convolutional neural network architectures.When evaluated on the BSDS500 dataset,the proposed PSwin edge detection algorithm achieves the best performance,with an ODS F-measure of 0.826 and an OIS of 0.841.

Key words: Edge detection, Feature pyramid network, Visual attention, Transfer learning, BSDS500

中图分类号: 

  • TP391
[1]ARBELÁEZ P,MAIRE M,FOWLKES C,et al.Contour Detection and Hierarchical Image Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(5):898-916.
[2]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[3]LIU Y,CHENG M M,HU X,et al.Richer convolutional fea-tures for edge detection[C]//Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition,CVPR 2017.2017:5872-5881.
[4]XIE S,TU Z.Holistically-Nested Edge Detection[J].International Journal of Computer Vision,2017,125(1/2/3):3-18.
[5]BERTASIUS G,SHI J,TORRESANI L.Deepedge:A multi-scale bifurcated deep network for top-down contour detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4380-4389.
[6]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[7]ZHOU B,KHOSLA A,LAPEDRIZA A,et al.Object Detectors Emerge in Deep Scene CNNs[J].arXiv:1412.6856,2014.
[8]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).2019:4171-4186.
[9]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.AnImage is Worth 16x16 Words:Transformers for Image Recognition at Scale[J].arXiv:2010.11929,2020.
[10]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.
[11]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[12]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence.2015:3431-3440.
[13]CANNY J.A Computational Approach to Edge Detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1986,PAMI-8(6):679-698.
[14]TAN M,PANG R,LE Q.EfficientDet:Scalable and EfficientObject Detection[C]//2020 IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2020:10778-10787.
[15]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]/ IEEE Transactions on Pattern Analysis and Machine Intelligence.2015:3431-3440.
[16]CANNY J.A Computational Approach to Edge Detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1986,PAMI-8(6):679-698.
[17]TAN M,PANG R,LE Q.EfficientDet:Scalable and Efficient Object Detection[C]//2020 IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2020:10778-10787.
[18]GANIN Y,LEMPITSKY V.N^4-Fields:Neural Network Nearest Neighbor Fields for Image Transforms[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:536-551.
[19]HE J,ZHANG S,YANG M,et al.Bi-directional cascade network for perceptual edge detection[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.2019:3823-3832.
[20]ZHAO H,SHI J,QI X,et al.Pyramid Scene Parsing Network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:6230-6239.
[21]PENG Z,HUANG W,GU S,et al.Conformer:Local FeaturesCoupling Global Representations for Visual Recognition[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV).2021.
[22]TOUVRON H,CORD M,DOUZE M,et al.Training data-effi-cient image transformers &distillation through attention[C]//Proceedings of the 38th International Conference on Machine Learning.2021:10347-10357.
[23]YUAN L,CHEN Y,WANG T,et al.Tokens-to-Token ViT:Training Vision Transformers from Scratch on ImageNet[C]//ICCV2021.2021:558-567.
[24]CARION N,MASSA F,SYNNAEVE G,et al.End-to-End Object Detection with Transformers[C]//ECCV 2020.2020:213-229.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!