计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 210900185-6.doi: 10.11896/jsjkx.210900185
冉煜, 张莉
RAN Yu, ZHANG Li
摘要: YOLOv5模型是目前文本检测较好的模型之一,针对文本目标长度不一,文本轮廓难以精准检测以及受自然场景中文字倾斜、光影的影响文本较难检测的问题,提出了R-YOLOv5(Rotated-YOLOv5)文本检测模型。首先融入基于仿射算法的文本分割模型,将图片的文本区域等比例切割为多个单字符块,解决文本没有闭合轮廓导致的YOLOv5模型锚定框拟合效果不佳的问题;然后使用旋转卷积层、旋转池化层、改进锚定框,提出了加强角度学习的RIoU(Rotated Intersection over Union)损失函数,实现了文本旋转倾斜特征的提取。在ICDAR2019-LSVT上对原模型与改进后的模型进行实验,实验结果显示,R-YOLOv5检测效果有较明显的提升,但由于模型层数加深,训练速率与检测速率相比原模型有小幅降低。相比其他模型,由于YOLOv5自身的优点,R-YOLOv5的检测效果与检测速度均远好于其他模型。
中图分类号:
[1]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas:IEEE,2016:779-788. [2]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//IEEE Conference on Computer Vision Pattern Recognition(CVPR).Honolulu:IEEE,2017:6517-6525. [3]REDMON J,FARHADI A.YOLOv3:An Incremental Improvement[EB/OL].[2018-04-08].https://arxiv.org/abs/1804.02767. [4]BOCHKOVSKIY A,WANG C Y,LIAO H.YOLOv4:Optimal Speed and Accuracy of Object Detection[EB/OL].[2020-04-23].https://arxiv.org/abs/2004.10934. [5]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//European Conference on Computer Vision.Cham:Springer,2016:21-37. [6]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149. [7]TIAN Z,HUANG W,HE T,et al.Detecting Text in Natural Image with Connectionist Text Proposal Network[C]//European Conference on Computer Vision.Cham:Springer,2016:56-72. [8]ZHOU X,YAO C,WEN H,et al.EAST:An Efficient and Accurate Scene Text Detector[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu:IEEE,2017:2642-2651. [9]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu:IEEE,2017:936-944. [10]LIU S,QI L,QIN H,et al.Path Aggregation Network for Instance Segmentation[C]//IEEE Conference on Computer Vision Pattern Recognition(CVPR).Salt Lake City:IEEE,2018:8759-8768. [11]MISRA D.Mish:A Self Regularized Non-Monotonic Neural Activation Function[EB/OL].[2019-08-23].https://arxiv.org/abs/1908.08681v1. [12]ZHENG Z,WANG P,LIU W,et al.Distance-IoU Loss:Faster and Better Learning for Bounding Box Regression [J].Procee-dings of the AAAI Conference on Artificial Intelligence,2020,34(7):12993-13000. [13]KHALIL A,JARRAH M,AL-AYYOUB M,et al.Text detection and script identification in natural scene images using deep learning[J].Computers & Electrical Engineering,2021,91(C):107043. [14]KESERWANI P,DHANKHAR A,SAINI R,et al.Quadbox:Quadrilateral Bounding Box Based Scene Text Detection Using Vector Regression[J].IEEE Access,2021,9:36802-36818. [15]YANG W J,ZOU B J,LI K W,et al.A Character Flow Framework for Multi-Oriented Scene Text Detection[J].Journal of Computer Science & Technology,2021,36(3):465-477. [16]NAGAOKA Y,MIYAZAKI T,SUGAYA Y,et al.Text Detection Using Multi-Stage Region Proposal Network Sensitive to Text Scale[J].Sensors,2021,21(4):1232. [17]ZHAO X,ZHOU Z,LI L,et al.Scene Text Detection Based On Fusion Network[J].International Journal of Pattern Recognition and Artificial Intelligence,2021,35(10):2153005. [18]LIU C Y,CHEN X X,LUO C J,et al.Deep learning methods for scene text detection and recognition [J].Journal of Image and Graphics,2021,26(6):1330-1367. [19]LI H,WANG X L,XIANG X G.Scene Text Detection Based on Triple Segmentation[J].Computer Science,2020,47(11):142-147. [20]YUAN X X,WU Q.Object Detection in Remote Sensing Images Based on Saliency Feature and Angle Information[J].Computer Science,2021,48(4):174-179. [21]GONG F M,LIU F H,LI J J,GONG W J.Scene Text Detection and Recognition Based on Deep Learning[J].Computer Systems &Applications,2021,30(8):179-185. [22]LIU Y J,YI X H,LI Y G,et al.Application of Scene Text Recognition Technology Based on Deep Learning:A Survey[J].Computer Engineering and Applications,2022,58(4):52-63. [23]SHAO H L,JI Y,LIU C P,et al.Scene Text Detection Algorithm Based on Enhanced Feature Pyramid Network[J].Computer Science,2022,49(2):248-255. [24]WANG F,HUANG J,WEN H W.Fast Text Detection Based on Improved YOLOv3[J].Telecommunication Engineering,2022,62(1):130-137. [25]LEI X T,HU J.Text center pixel reconstruction to achieve efficient arbitrary shape text detection[J/OL].Computer Enginee-ring and Applications:1-11.[2022-02-25].http://kns.cnki.net/kcms/detail/11.2127.TP.20220217.1622.006.html. [26]CHEN P,LI M,ZHANG Y,WANG Z P.An End-to-End Natural Scene Text Detection and Recognition Model[J/OL].Measurement & Control Technology:1-7.[2022-02-25].https://kns.cnki.net/kcms/detail/detail.aspx?doi=10.19708/j.ckjs.2021.10.276. [27]SUN G M,GUAN S K,LI Y,et al.Handwritten text detection on test paper using improved CTPN algorithm[J].Information Technology,2020,44(9):94-98. [28]CHEN M M,XU J H.Scene text detection model based on high resolution convolutional neural networks[J].Computer Applications and Software,2020,37(10):138-144. [29]LIU Y,WEN J.Complex Scene Text Detection Based on Attention Mechanism[J].Computer Science,2020,47(7):135-140. [30]KOU X C,ZHANG H R,FENG J,et al.Distortion Correction Algorithm for Complex Document Image Based on Multi-level Text Detection[J].Computer Science,2021,48(12):249-255. |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[3] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[4] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[5] | 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉. 基于边框距离度量的增量目标检测方法 Incremental Object Detection Method Based on Border Distance Measurement 计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132 |
[6] | 王灿, 刘永坚, 解庆, 马艳春. 基于软标签和样本权重优化的Anchor Free目标检测算法 Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization 计算机科学, 2022, 49(8): 157-164. https://doi.org/10.11896/jsjkx.210600240 |
[7] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[8] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[9] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[10] | 刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179 |
[11] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
[12] | 孟月波, 穆思蓉, 刘光辉, 徐胜军, 韩九强. 基于向量注意力机制GoogLeNet-GMP的行人重识别方法 Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism 计算机科学, 2022, 49(7): 142-147. https://doi.org/10.11896/jsjkx.210600198 |
[13] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[14] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
[15] | 王杉, 徐楚怡, 师春香, 张瑛. 基于CNN-LSTM的卫星云图云分类方法研究 Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM 计算机科学, 2022, 49(6A): 675-679. https://doi.org/10.11896/jsjkx.210300177 |
|