R-YOLOv5:自动切割的旋转的文本检测模型

doi:10.11896/jsjkx.210900185

Abstract

Abstract: YOLOv5 model is currently one of the best models for object detection.To solve the problem of different lengths of text lines,the inclination of text,light and shadow in natural scenes,etc.the R-YOLOv5(Rotated-YOLOv5) text detection model is proposed,which improves the YOLOv5 model to deal with the weakness in text detection.Firstly,the text segmentation model based on affine algorithm is incorporated.According to the length of the string and the shape of the text area,the text area of the picture is cut into multiple single-character blocks in equal proportions to solve the problem of poor effect of YOLOv5 model caused by the text objects without closed contour lines.Then,using the rotated convolutional neural network layer,rotated max-pooling layer and improved anchor box,we propose a rotated intersection over union(RIoU) loss function that strengthens angle learning to achieve the extraction of rotation and tilt features.The original model and the improved model are tested on ICDAR2019-LSVT.Experimental results show that the detection effect of R-YOLOv5 are significantly improved.However,due to the deepening of model layers,the training efficiency and detection efficiency are slightly reduced compared with the original mo-del.Compared with other models,due to the advantages of YOLOv5,the detection effect and efficiency of R-YOLOv5 are much better than that of other models.

Key words: Computer vision, Object detection, Text detection, Convolutional neural network, Rotation tilt, Loss function, YOLO

CLC Number:

TP389.1

RAN Yu, ZHANG Li. R-YOLOv5:Auto-cutting,Rotated Text Detection Model[J].Computer Science, 2022, 49(11A): 210900185-6.

References

[1]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas:IEEE,2016:779-788.
[2]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//IEEE Conference on Computer Vision Pattern Recognition(CVPR).Honolulu:IEEE,2017:6517-6525.
[3]REDMON J,FARHADI A.YOLOv3:An Incremental Improvement[EB/OL].[2018-04-08].https://arxiv.org/abs/1804.02767.
[4]BOCHKOVSKIY A,WANG C Y,LIAO H.YOLOv4:Optimal Speed and Accuracy of Object Detection[EB/OL].[2020-04-23].https://arxiv.org/abs/2004.10934.
[5]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//European Conference on Computer Vision.Cham:Springer,2016:21-37.
[6]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149.
[7]TIAN Z,HUANG W,HE T,et al.Detecting Text in Natural Image with Connectionist Text Proposal Network[C]//European Conference on Computer Vision.Cham:Springer,2016:56-72.
[8]ZHOU X,YAO C,WEN H,et al.EAST:An Efficient and Accurate Scene Text Detector[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu:IEEE,2017:2642-2651.
[9]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu:IEEE,2017:936-944.
[10]LIU S,QI L,QIN H,et al.Path Aggregation Network for Instance Segmentation[C]//IEEE Conference on Computer Vision Pattern Recognition(CVPR).Salt Lake City:IEEE,2018:8759-8768.
[11]MISRA D.Mish:A Self Regularized Non-Monotonic Neural Activation Function[EB/OL].[2019-08-23].https://arxiv.org/abs/1908.08681v1.
[12]ZHENG Z,WANG P,LIU W,et al.Distance-IoU Loss:Faster and Better Learning for Bounding Box Regression [J].Procee-dings of the AAAI Conference on Artificial Intelligence,2020,34(7):12993-13000.
[13]KHALIL A,JARRAH M,AL-AYYOUB M,et al.Text detection and script identification in natural scene images using deep learning[J].Computers & Electrical Engineering,2021,91(C):107043.
[14]KESERWANI P,DHANKHAR A,SAINI R,et al.Quadbox:Quadrilateral Bounding Box Based Scene Text Detection Using Vector Regression[J].IEEE Access,2021,9:36802-36818.
[15]YANG W J,ZOU B J,LI K W,et al.A Character Flow Framework for Multi-Oriented Scene Text Detection[J].Journal of Computer Science & Technology,2021,36(3):465-477.
[16]NAGAOKA Y,MIYAZAKI T,SUGAYA Y,et al.Text Detection Using Multi-Stage Region Proposal Network Sensitive to Text Scale[J].Sensors,2021,21(4):1232.
[17]ZHAO X,ZHOU Z,LI L,et al.Scene Text Detection Based On Fusion Network[J].International Journal of Pattern Recognition and Artificial Intelligence,2021,35(10):2153005.
[18]LIU C Y,CHEN X X,LUO C J,et al.Deep learning methods for scene text detection and recognition [J].Journal of Image and Graphics,2021,26(6):1330-1367.
[19]LI H,WANG X L,XIANG X G.Scene Text Detection Based on Triple Segmentation[J].Computer Science,2020,47(11):142-147.
[20]YUAN X X,WU Q.Object Detection in Remote Sensing Images Based on Saliency Feature and Angle Information[J].Computer Science,2021,48(4):174-179.
[21]GONG F M,LIU F H,LI J J,GONG W J.Scene Text Detection and Recognition Based on Deep Learning[J].Computer Systems &Applications,2021,30(8):179-185.
[22]LIU Y J,YI X H,LI Y G,et al.Application of Scene Text Recognition Technology Based on Deep Learning:A Survey[J].Computer Engineering and Applications,2022,58(4):52-63.
[23]SHAO H L,JI Y,LIU C P,et al.Scene Text Detection Algorithm Based on Enhanced Feature Pyramid Network[J].Computer Science,2022,49(2):248-255.
[24]WANG F,HUANG J,WEN H W.Fast Text Detection Based on Improved YOLOv3[J].Telecommunication Engineering,2022,62(1):130-137.
[25]LEI X T,HU J.Text center pixel reconstruction to achieve efficient arbitrary shape text detection[J/OL].Computer Enginee-ring and Applications:1-11.[2022-02-25].http://kns.cnki.net/kcms/detail/11.2127.TP.20220217.1622.006.html.
[26]CHEN P,LI M,ZHANG Y,WANG Z P.An End-to-End Natural Scene Text Detection and Recognition Model[J/OL].Measurement ＆ Control Technology:1-7.[2022-02-25].https://kns.cnki.net/kcms/detail/detail.aspx?doi=10.19708/j.ckjs.2021.10.276.
[27]SUN G M,GUAN S K,LI Y,et al.Handwritten text detection on test paper using improved CTPN algorithm[J].Information Technology,2020,44(9):94-98.
[28]CHEN M M,XU J H.Scene text detection model based on high resolution convolutional neural networks[J].Computer Applications and Software,2020,37(10):138-144.
[29]LIU Y,WEN J.Complex Scene Text Detection Based on Attention Mechanism[J].Computer Science,2020,47(7):135-140.
[30]KOU X C,ZHANG H R,FENG J,et al.Distortion Correction Algorithm for Complex Document Image Based on Multi-level Text Detection[J].Computer Science,2021,48(12):249-255.

Related Articles 15

[1]	ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[2]	CHEN Yong-quan, JIANG Ying. Analysis Method of APP User Behavior Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(8): 78-85.
[3]	ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[4]	LIU Dong-mei, XU Yang, WU Ze-bin, LIU Qian, SONG Bin, WEI Zhi-hui. Incremental Object Detection Method Based on Border Distance Measurement [J]. Computer Science, 2022, 49(8): 136-142.
[5]	WANG Can, LIU Yong-jian, XIE Qing, MA Yan-chun. Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization [J]. Computer Science, 2022, 49(8): 157-164.
[6]	DAI Zhao-xia, LI Jin-xin, ZHANG Xiang-dong, XU Xu, MEI Lin, ZHANG Liang. Super-resolution Reconstruction of MRI Based on DNGAN [J]. Computer Science, 2022, 49(7): 113-119.
[7]	LIU Yue-hong, NIU Shao-hua, SHEN Xian-hao. Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(7): 127-131.
[8]	XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[9]	MENG Yue-bo, MU Si-rong, LIU Guang-hui, XU Sheng-jun, HAN Jiu-qiang. Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism [J]. Computer Science, 2022, 49(7): 142-147.
[10]	CHU Yu-chun, GONG Hang, Wang Xue-fang, LIU Pei-shun. Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 [J]. Computer Science, 2022, 49(6A): 337-344.
[11]	YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352.
[12]	YANG Jian-nan, ZHANG Fan. Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure [J]. Computer Science, 2022, 49(6A): 353-357.
[13]	HAO Qiang, LI Jie, ZHANG Man, WANG Lu. Spatial Non-cooperative Target Components Recognition Algorithm Based on Improved YOLOv3 [J]. Computer Science, 2022, 49(6A): 358-362.
[14]	GAO Rong-hua, BAI Qiang, WANG Rong, WU Hua-rui, SUN Xiang. Multi-tree Network Multi-crop Early Disease Recognition Method Based on Improved Attention Mechanism [J]. Computer Science, 2022, 49(6A): 363-369.
[15]	ZHANG Jia-hao, LIU Feng, QI Jia-yin. Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer [J]. Computer Science, 2022, 49(6A): 370-377.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

R-YOLOv5:Auto-cutting,Rotated Text Detection Model

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0