基于改进区域候选网络的场景文本检测

doi:10.11896/jsjkx.211000191

摘要/Abstract

摘要： 自然场景中的文本图像具有十分复杂多变的特征,使用区域候选网络(Region Proposal Network,RPN)提取文本矩形位置候选框是不可或缺的一个步骤,能够极大地提升文本检测的精度。然而最近的研究表明,通过最小化平滑的L₁损失函数来回归矩形候选框中心点、宽和高的方式容易产生边界信息缺失、回归不准确等问题。针对这一问题,提出了一种基于改进区域候选网络的场景文本检测模型。首先,使用残差网络和特征金字塔网络组成的骨干网络生成共享特征图。然后,使用改进的回归取点方式和基于顶点的VIOU损失函数(Vertex-IOU)在共享特征图上生成系列文本矩形候选框。接着,使用ROI Align将这些候选框转化为固定大小的特征图在全连接层进行边界框预测。最后,在ICDAR2015数据集上进行对比实验,结果表明,与其他模型相比,所提模型可以提升检测精度,证明了所提模型的有效性。

关键词: 深度学习, 场景文本检测, 区域候选网络, 回归方式, 损失函数

Abstract: Scene text images have very complex and changeable features.Using region proposal network(RPN) to extract text rectangle position candidate boxes is an indispensable step,which can greatly improve the accuracy of text detection.However,recent studies show that the methods of regressing the center point,width and height of the text rectangular candidate boxes by minimizing the smooth L₁ loss function would easily cause problems such as missing boundary information and inaccurate regression.Therefore,this paper proposes a scene text detection model based on improved region proposal network.First,the backbone network composed of the residual network and the feature pyramid network is used to generate a shared feature map.Then,an improved regression method and vertex-based loss function(Vertex-IOU) are used to generate a series of text rectangular candidate boxes on the shared feature map.Finally,ROI Align is used to convert these candidate boxes into fixed-size feature maps for bounding box regression in the fully connected layer.Through comparative experiments on ICDAR2015 dataset,the results show that the test effect is improved compared with other models,which proves the effectiveness of our model.

Key words: Keywords deep learning, Scene text detection, Region proposal network, Regression method, Loss function

中图分类号:

TP391

李俊林, 欧阳智, 杜逆索. 基于改进区域候选网络的场景文本检测[J]. 计算机科学, 2023, 50(2): 201-208. https://doi.org/10.11896/jsjkx.211000191

LI Junlin, OUYANG Zhi, DU Nisuo. Scene Text Detection with Improved Region Proposal Network[J]. Computer Science, 2023, 50(2): 201-208. https://doi.org/10.11896/jsjkx.211000191

参考文献

[1]WANG R M,SANG N,DING D,et al.Text Detection in Natural Scene Image:A Survey [J].Acta Automatica Sinaca,2018,44(12):2113-2141.
[2]MIAO Y Q,LIU S Q,ZHANG W Z,et al.Chinese text detection algorithm in natural sceneimages[J].Computer Engineering and Design,2018,39(3):804-807,818.
[3]JIANG W,ZHANG C S,YIN X C.Deep Learning Based Scene Text Detection:ASurvey[J].Acta Electronica Sinica,2019,47(5):1152-1161.
[4]SIMONYAN K,ZISSERMAN A.Very DeepConvolutional Networks for Large-Scale Image Recognition[C]//Proceedings of the International Conference on Learning Representations.San Diego:2015.
[5]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778.
[6]LONG J,SHELHAMER E,DARRELL T.Fully ConvolutionalNetworks for Semantic Segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition.Boston,Massachusetts:IEEE,2015:3431-3440.
[7]XUAN D D,WANG J,WANG Z.Salient target detection based on high-level priori semantics[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2020,32(2):304-312.
[8]ROSS G.Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on ComputerVision.Santiago,Chile:IEEE,2015:1440-1448.
[9]YU J H,JIANG Y N,WANG Z Y,et al.UnitBox:An Advanced Object Detection Network[C]//Proceedings of the 2016 ACM Multimedia Conference.Amsterdam:2016:516-520.
[10]REZATOFIGHI H,TSOI N,GWAK J,et al.Generalized intersection over union:A metric and a loss for bounding box regression[C]//Proceedings of the 2019 IEEE Conference on Compu-ter Vision and Pattern Recognition.Long Beach,CA:IEEE,2019:658-666.
[11]TIAN Z,HUANG W,HE T,et al.Detecting Text in Natural Image with Connectionist Text Proposal Network[C]//Proceedings of the 14th European Conference on Computer Vision.Amsterdam,2016:56-72.
[12]MA J Q,SHAO W Y,YE H,et al.Arbitrary-Oriented Scene Text Detection via RotationProposals[J].arXiv:1703.01086,2017.
[13]ZHANG C Q,LIANG B R,HUANG Z M,et al.Look More Than Once:An Accurate Detector forText of Arbitrary Shapes[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA:IEEE,2019:10552-10651.
[14]ZHOU X,YAO C,WEN H,et al.EAST:An Efficient and Accurate Scene Text Detector[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,Hawaii:IEEE,2017:2642-2651.
[15]BEAK Y,LEE B,HAN D,et al.Character Region Awareness for Text Detection[C]//Proceedings of the 2019 IEEE Confe-rence on Computer Vision and Pattern Recognition.Long Beach,CA:IEEE,2019:9365-9374.
[16]LIU Y L,ZHANG S,JUN L W,et al.Omnidirectional scene text detection with sequential-free box discretization[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence.Macao:2019:3052-3058.
[17]HE K M,GEORGIA G,PIOTR D,et al.Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision.Venice:IEEE,2017:2980-2988.
[18]HUANG D,CHEN Z,FENG X.Object detection method based on graph convolution net under limitedsamples[J].Journal of Chongqing University of Technology(Natural Science),2022,36(6):172-180.
[19]ANKUSH G,ANDREA V,ANDREW Z.Synthetic Data forText Localisation in Natural Images[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:2315-2324.
[20]NIBAL N,FEI Y,IMEN B,et al.ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification-RRC-MLT[C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition.Kyoto:2017:1454-1459.
[21]LIU Y L,JIN L W,ZHANG S T,et al.Detecting Curve Text in the Wild:New Dataset and NewSolution[J].arXiv:1712.02170,2017.
[22]LYU P Y,YAO C,WU W H,et al.Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition,Salt Lake City.Utah:IEEE,2018:7553-7563.
[23]DENG D,LIU H F,LI X L,et al.PixelLink:Detecting Scene Text via Instance Segmentation[C]//Proceedings of the 32th AAAI Conference on Artificial Intelligence.New Orleans,Louisiana:2017:6773-6780.
[24]WANG W H,XIE E Z,SONG X G,et al.Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network[C]//Proceedings of the 2019 IEEE International Confe-rence on Computer Vision.Seoul:IEEE,2019:8439-8448.
[25]FENG W,HE W H,YIN F,et al.TextDragon:An End-to-End Framework for Arbitrary Shaped Text Spotting[C]//Procee-dings of the 2019 IEEE International Conference on Computer Vision.Seoul:IEEE,2019:9075-9084.
[26]XU Y C,WANG Y K,ZHOU W,et al.TextField:Learning a Deep Direction Field for Irregular Scene Text Detection[J].ar-Xiv:1812.01393,2018.
[27]WANG W H,XIE E Z,LI X,et al.Shape Robust Text Detection With Progressive Scale Expansion Network[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA:IEEE,2019:9336- 9345.
[28]RICHARDSON E,AZAR Y,AVIOZ O,et al.It's All About The Scale-Efficient Text Detection Using Adaptive Scaling[C]//Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision.Aspen,Colorado:IEEE,2020:1844- 1853.
[29]ZHANG L,LIU Y,XIAO H,et al.Efficient Scene Text Detection with Textual Attention Tower[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).Barcelona:IEEE,2020:4272-4276.
[30]LIAO M,WAN Z,YAO C,et al.Real-Time Scene Text Detection with Differentiable Binarization[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence.New York:2020:11474-11481.
[31]SHAO H L,JI Y,LIU C P,et al.Scene Text Detection Algorithm Based on Enhanced Feature Pyramid Network[J].Computer Science,2022,49(2):248-255.
[32]XUE C H,LU S J,ZHANG W.MSR:Multi-Scale Shape Regression for Scene Text Detection[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence.Macao:2019:989-995.
[33]SHI B G,BAI X,SERGE J B.Detecting Oriented Text in Natural Images by Linking Segments[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,Hawaii:IEEE,2017:3482-3490.
[34]LIAO M H,ZHU Z,SHI B G,et al.Rotation-Sensitive Regression for Oriented Scene Text Detection[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,Utah:IEEE,2018:5905-5918.
[35]WANG Y X,XIE H T,ZHA Z J,et al.ContourNet:Taking a Further Step toward Accurate Arbitrary- shaped Scene Text Detection[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2020:11750-11759.
[36]XIE B H,QIN Y L,ZHANG Y J.Scene Text Detection Based on Learning Active Center ContourModel[J].Computer Engineering,2022,48(3):244-252,262.

相关文章 15

[1]	王晓飞, 樊学强, 李章维. 基于迁移学习和多视图特征融合提高RNA碱基相互作用预测 Improving RNA Base Interactions Prediction Based on Transfer Learning and Multi-view Feature Fusion 计算机科学, 2023, 50(3): 164-172. https://doi.org/10.11896/jsjkx.211200186
[2]	董永峰, 黄港, 薛婉若, 李林昊. 融合IRT的图注意力深度知识追踪模型 Graph Attention Deep Knowledge Tracing Model Integrated with IRT 计算机科学, 2023, 50(3): 173-180. https://doi.org/10.11896/jsjkx.211200134
[3]	华晓凤, 冯娜, 于俊清, 何云峰. 基于规则推理的足球视频任意球射门事件检测 Shooting Event Detection of Free Kick in Soccer Video Based on Rule Reasoning 计算机科学, 2023, 50(3): 181-190. https://doi.org/10.11896/jsjkx.220300062
[4]	梅鹏程, 杨吉斌, 张强, 黄翔. 一种基于三维卷积的声学事件联合估计方法 Sound Event Joint Estimation Method Based on Three-dimension Convolution 计算机科学, 2023, 50(3): 191-198. https://doi.org/10.11896/jsjkx.220500259
[5]	白雪飞, 马亚楠, 王文剑. 基于特征融合的边缘引导乳腺超声图像分割方法 Segmentation Method of Edge-guided Breast Ultrasound Images Based on Feature Fusion 计算机科学, 2023, 50(3): 199-207. https://doi.org/10.11896/jsjkx.211200294
[6]	刘航, 普园媛, 吕大华, 赵征鹏, 徐丹, 钱文华. 极化自注意力约束颜色溢出的图像自动上色 Polarized Self-attention Constrains Color Overflow in Automatic Coloring of Image 计算机科学, 2023, 50(3): 208-215. https://doi.org/10.11896/jsjkx.220100149
[7]	陈亮, 王璐, 李生春, 刘昌宏. 基于深度学习的可视化仪表板生成技术研究 Study on Visual Dashboard Generation Technology Based on Deep Learning 计算机科学, 2023, 50(3): 238-245. https://doi.org/10.11896/jsjkx.230100064
[8]	张译, 吴秦. 特征增强损失与前景注意力人群计数网络 Crowd Counting Network Based on Feature Enhancement Loss and Foreground Attention 计算机科学, 2023, 50(3): 246-253. https://doi.org/10.11896/jsjkx.220100219
[9]	应宗浩, 吴槟. 深度学习模型的后门攻击研究综述 Backdoor Attack on Deep Learning Models:A Survey 计算机科学, 2023, 50(3): 333-350. https://doi.org/10.11896/jsjkx.220600031
[10]	邹芸竹, 杜圣东, 滕飞, 李天瑞. 一种基于多模态深度特征融合的视觉问答模型 Visual Question Answering Model Based on Multi-modal Deep Feature Fusion 计算机科学, 2023, 50(2): 123-129. https://doi.org/10.11896/jsjkx.211200303
[11]	王鹏宇, 台文鑫, 刘芳, 钟婷, 罗绪成, 周帆. 基于数据增强的自监督飞行航迹预测 Self-supervised Flight Trajectory Prediction Based on Data Augmentation 计算机科学, 2023, 50(2): 130-137. https://doi.org/10.11896/jsjkx.211200016
[12]	郭楠, 李婧源, 任曦. 基于深度学习的刚体位姿估计方法综述 Survey of Rigid Object Pose Estimation Algorithms Based on Deep Learning 计算机科学, 2023, 50(2): 178-189. https://doi.org/10.11896/jsjkx.211200164
[13]	华杰, 刘学亮, 赵烨. 基于特征融合的小样本目标检测 Few-shot Object Detection Based on Feature Fusion 计算机科学, 2023, 50(2): 209-213. https://doi.org/10.11896/jsjkx.220500153
[14]	梁佳利, 华保健, 苏少博. 融合循环划分的张量指令生成优化 Tensor Instruction Generation Optimization Fusing with Loop Partitioning 计算机科学, 2023, 50(2): 374-383. https://doi.org/10.11896/jsjkx.220300147
[15]	蔡肖, 陈志华, 盛斌. 基于移位窗口金字塔Transformer的遥感图像目标检测 SPT:Swin Pyramid Transformer for Object Detection of Remote Sensing 计算机科学, 2023, 50(1): 105-113. https://doi.org/10.11896/jsjkx.211100208

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed