计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 210900185-6.doi: 10.11896/jsjkx.210900185

• 图像处理&多媒体技术 • 上一篇    下一篇

R-YOLOv5:自动切割的旋转的文本检测模型

冉煜, 张莉   

  1. 对外经济贸易大学信息学院 北京 100029
  • 出版日期:2022-11-10 发布日期:2022-11-21
  • 通讯作者: 张莉(zhangli_amy@uibe.edu.cn)
  • 作者简介:(2920763948@qq.com)

R-YOLOv5:Auto-cutting,Rotated Text Detection Model

RAN Yu, ZHANG Li   

  1. School of Information Technology and Management,University of International Business and Economics,Beijing 100029,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:RAN Yu,born in 1999,postgraduate.His main research interests include deep learning and object detection.
    ZHANG Li,born in 1972,Ph.D,professor.Her main research interests include machine learning,deep learning, business intelligence and etc.

摘要: YOLOv5模型是目前文本检测较好的模型之一,针对文本目标长度不一,文本轮廓难以精准检测以及受自然场景中文字倾斜、光影的影响文本较难检测的问题,提出了R-YOLOv5(Rotated-YOLOv5)文本检测模型。首先融入基于仿射算法的文本分割模型,将图片的文本区域等比例切割为多个单字符块,解决文本没有闭合轮廓导致的YOLOv5模型锚定框拟合效果不佳的问题;然后使用旋转卷积层、旋转池化层、改进锚定框,提出了加强角度学习的RIoU(Rotated Intersection over Union)损失函数,实现了文本旋转倾斜特征的提取。在ICDAR2019-LSVT上对原模型与改进后的模型进行实验,实验结果显示,R-YOLOv5检测效果有较明显的提升,但由于模型层数加深,训练速率与检测速率相比原模型有小幅降低。相比其他模型,由于YOLOv5自身的优点,R-YOLOv5的检测效果与检测速度均远好于其他模型。

关键词: 计算机视觉, 目标检测, 文本检测, 卷积神经网络, 旋转倾斜, 损失函数, YOLO

Abstract: YOLOv5 model is currently one of the best models for object detection.To solve the problem of different lengths of text lines,the inclination of text,light and shadow in natural scenes,etc.the R-YOLOv5(Rotated-YOLOv5) text detection model is proposed,which improves the YOLOv5 model to deal with the weakness in text detection.Firstly,the text segmentation model based on affine algorithm is incorporated.According to the length of the string and the shape of the text area,the text area of the picture is cut into multiple single-character blocks in equal proportions to solve the problem of poor effect of YOLOv5 model caused by the text objects without closed contour lines.Then,using the rotated convolutional neural network layer,rotated max-pooling layer and improved anchor box,we propose a rotated intersection over union(RIoU) loss function that strengthens angle learning to achieve the extraction of rotation and tilt features.The original model and the improved model are tested on ICDAR2019-LSVT.Experimental results show that the detection effect of R-YOLOv5 are significantly improved.However,due to the deepening of model layers,the training efficiency and detection efficiency are slightly reduced compared with the original mo-del.Compared with other models,due to the advantages of YOLOv5,the detection effect and efficiency of R-YOLOv5 are much better than that of other models.

Key words: Computer vision, Object detection, Text detection, Convolutional neural network, Rotation tilt, Loss function, YOLO

中图分类号: 

  • TP389.1
[1]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas:IEEE,2016:779-788.
[2]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//IEEE Conference on Computer Vision Pattern Recognition(CVPR).Honolulu:IEEE,2017:6517-6525.
[3]REDMON J,FARHADI A.YOLOv3:An Incremental Improvement[EB/OL].[2018-04-08].https://arxiv.org/abs/1804.02767.
[4]BOCHKOVSKIY A,WANG C Y,LIAO H.YOLOv4:Optimal Speed and Accuracy of Object Detection[EB/OL].[2020-04-23].https://arxiv.org/abs/2004.10934.
[5]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//European Conference on Computer Vision.Cham:Springer,2016:21-37.
[6]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149.
[7]TIAN Z,HUANG W,HE T,et al.Detecting Text in Natural Image with Connectionist Text Proposal Network[C]//European Conference on Computer Vision.Cham:Springer,2016:56-72.
[8]ZHOU X,YAO C,WEN H,et al.EAST:An Efficient and Accurate Scene Text Detector[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu:IEEE,2017:2642-2651.
[9]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu:IEEE,2017:936-944.
[10]LIU S,QI L,QIN H,et al.Path Aggregation Network for Instance Segmentation[C]//IEEE Conference on Computer Vision Pattern Recognition(CVPR).Salt Lake City:IEEE,2018:8759-8768.
[11]MISRA D.Mish:A Self Regularized Non-Monotonic Neural Activation Function[EB/OL].[2019-08-23].https://arxiv.org/abs/1908.08681v1.
[12]ZHENG Z,WANG P,LIU W,et al.Distance-IoU Loss:Faster and Better Learning for Bounding Box Regression [J].Procee-dings of the AAAI Conference on Artificial Intelligence,2020,34(7):12993-13000.
[13]KHALIL A,JARRAH M,AL-AYYOUB M,et al.Text detection and script identification in natural scene images using deep learning[J].Computers & Electrical Engineering,2021,91(C):107043.
[14]KESERWANI P,DHANKHAR A,SAINI R,et al.Quadbox:Quadrilateral Bounding Box Based Scene Text Detection Using Vector Regression[J].IEEE Access,2021,9:36802-36818.
[15]YANG W J,ZOU B J,LI K W,et al.A Character Flow Framework for Multi-Oriented Scene Text Detection[J].Journal of Computer Science & Technology,2021,36(3):465-477.
[16]NAGAOKA Y,MIYAZAKI T,SUGAYA Y,et al.Text Detection Using Multi-Stage Region Proposal Network Sensitive to Text Scale[J].Sensors,2021,21(4):1232.
[17]ZHAO X,ZHOU Z,LI L,et al.Scene Text Detection Based On Fusion Network[J].International Journal of Pattern Recognition and Artificial Intelligence,2021,35(10):2153005.
[18]LIU C Y,CHEN X X,LUO C J,et al.Deep learning methods for scene text detection and recognition [J].Journal of Image and Graphics,2021,26(6):1330-1367.
[19]LI H,WANG X L,XIANG X G.Scene Text Detection Based on Triple Segmentation[J].Computer Science,2020,47(11):142-147.
[20]YUAN X X,WU Q.Object Detection in Remote Sensing Images Based on Saliency Feature and Angle Information[J].Computer Science,2021,48(4):174-179.
[21]GONG F M,LIU F H,LI J J,GONG W J.Scene Text Detection and Recognition Based on Deep Learning[J].Computer Systems &Applications,2021,30(8):179-185.
[22]LIU Y J,YI X H,LI Y G,et al.Application of Scene Text Recognition Technology Based on Deep Learning:A Survey[J].Computer Engineering and Applications,2022,58(4):52-63.
[23]SHAO H L,JI Y,LIU C P,et al.Scene Text Detection Algorithm Based on Enhanced Feature Pyramid Network[J].Computer Science,2022,49(2):248-255.
[24]WANG F,HUANG J,WEN H W.Fast Text Detection Based on Improved YOLOv3[J].Telecommunication Engineering,2022,62(1):130-137.
[25]LEI X T,HU J.Text center pixel reconstruction to achieve efficient arbitrary shape text detection[J/OL].Computer Enginee-ring and Applications:1-11.[2022-02-25].http://kns.cnki.net/kcms/detail/11.2127.TP.20220217.1622.006.html.
[26]CHEN P,LI M,ZHANG Y,WANG Z P.An End-to-End Natural Scene Text Detection and Recognition Model[J/OL].Measurement & Control Technology:1-7.[2022-02-25].https://kns.cnki.net/kcms/detail/detail.aspx?doi=10.19708/j.ckjs.2021.10.276.
[27]SUN G M,GUAN S K,LI Y,et al.Handwritten text detection on test paper using improved CTPN algorithm[J].Information Technology,2020,44(9):94-98.
[28]CHEN M M,XU J H.Scene text detection model based on high resolution convolutional neural networks[J].Computer Applications and Software,2020,37(10):138-144.
[29]LIU Y,WEN J.Complex Scene Text Detection Based on Attention Mechanism[J].Computer Science,2020,47(7):135-140.
[30]KOU X C,ZHANG H R,FENG J,et al.Distortion Correction Algorithm for Complex Document Image Based on Multi-level Text Detection[J].Computer Science,2021,48(12):249-255.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5] 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉.
基于边框距离度量的增量目标检测方法
Incremental Object Detection Method Based on Border Distance Measurement
计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132
[6] 王灿, 刘永坚, 解庆, 马艳春.
基于软标签和样本权重优化的Anchor Free目标检测算法
Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization
计算机科学, 2022, 49(8): 157-164. https://doi.org/10.11896/jsjkx.210600240
[7] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[8] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[9] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[10] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[11] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[12] 孟月波, 穆思蓉, 刘光辉, 徐胜军, 韩九强.
基于向量注意力机制GoogLeNet-GMP的行人重识别方法
Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism
计算机科学, 2022, 49(7): 142-147. https://doi.org/10.11896/jsjkx.210600198
[13] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[14] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[15] 王杉, 徐楚怡, 师春香, 张瑛.
基于CNN-LSTM的卫星云图云分类方法研究
Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM
计算机科学, 2022, 49(6A): 675-679. https://doi.org/10.11896/jsjkx.210300177
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!