计算机科学 ›› 2023, Vol. 50 ›› Issue (2): 201-208.doi: 10.11896/jsjkx.211000191
李俊林1, 欧阳智2, 杜逆索1,2
LI Junlin1, OUYANG Zhi2, DU Nisuo1,2
摘要: 自然场景中的文本图像具有十分复杂多变的特征,使用区域候选网络(Region Proposal Network,RPN)提取文本矩形位置候选框是不可或缺的一个步骤,能够极大地提升文本检测的精度。然而最近的研究表明,通过最小化平滑的L1损失函数来回归矩形候选框中心点、宽和高的方式容易产生边界信息缺失、回归不准确等问题。针对这一问题,提出了一种基于改进区域候选网络的场景文本检测模型。首先,使用残差网络和特征金字塔网络组成的骨干网络生成共享特征图。然后,使用改进的回归取点方式和基于顶点的VIOU损失函数(Vertex-IOU)在共享特征图上生成系列文本矩形候选框。接着,使用ROI Align将这些候选框转化为固定大小的特征图在全连接层进行边界框预测。最后,在ICDAR2015数据集上进行对比实验,结果表明,与其他模型相比,所提模型可以提升检测精度,证明了所提模型的有效性。
中图分类号:
[1]WANG R M,SANG N,DING D,et al.Text Detection in Natural Scene Image:A Survey [J].Acta Automatica Sinaca,2018,44(12):2113-2141. [2]MIAO Y Q,LIU S Q,ZHANG W Z,et al.Chinese text detection algorithm in natural sceneimages[J].Computer Engineering and Design,2018,39(3):804-807,818. [3]JIANG W,ZHANG C S,YIN X C.Deep Learning Based Scene Text Detection:ASurvey[J].Acta Electronica Sinica,2019,47(5):1152-1161. [4]SIMONYAN K,ZISSERMAN A.Very DeepConvolutional Networks for Large-Scale Image Recognition[C]//Proceedings of the International Conference on Learning Representations.San Diego:2015. [5]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proceedings of the 2016 IEEE Conference onComputer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778. [6]LONG J,SHELHAMER E,DARRELL T.Fully ConvolutionalNetworks for Semantic Segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition.Boston,Massachusetts:IEEE,2015:3431-3440. [7]XUAN D D,WANG J,WANG Z.Salient target detection based on high-level priori semantics[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2020,32(2):304-312. [8]ROSS G.Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on ComputerVision.Santiago,Chile:IEEE,2015:1440-1448. [9]YU J H,JIANG Y N,WANG Z Y,et al.UnitBox:An Advanced Object Detection Network[C]//Proceedings of the 2016 ACM Multimedia Conference.Amsterdam:2016:516-520. [10]REZATOFIGHI H,TSOI N,GWAK J,et al.Generalized intersection over union:A metric and a loss for bounding box regression[C]//Proceedings of the 2019 IEEE Conference on Compu-ter Vision and Pattern Recognition.Long Beach,CA:IEEE,2019:658-666. [11]TIAN Z,HUANG W,HE T,et al.Detecting Text in Natural Image with Connectionist Text Proposal Network[C]//Proceedings of the 14th European Conference on Computer Vision.Amsterdam,2016:56-72. [12]MA J Q,SHAO W Y,YE H,et al.Arbitrary-Oriented Scene Text Detection via RotationProposals[J].arXiv:1703.01086,2017. [13]ZHANG C Q,LIANG B R,HUANG Z M,et al.Look More Than Once:An Accurate Detector forText of Arbitrary Shapes[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA:IEEE,2019:10552-10651. [14]ZHOU X,YAO C,WEN H,et al.EAST:An Efficient and Accurate Scene Text Detector[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,Hawaii:IEEE,2017:2642-2651. [15]BEAK Y,LEE B,HAN D,et al.Character Region Awareness for Text Detection[C]//Proceedings of the 2019 IEEE Confe-rence on Computer Vision and Pattern Recognition.Long Beach,CA:IEEE,2019:9365-9374. [16]LIU Y L,ZHANG S,JUN L W,et al.Omnidirectional scene text detection with sequential-free box discretization[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence.Macao:2019:3052-3058. [17]HE K M,GEORGIA G,PIOTR D,et al.Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision.Venice:IEEE,2017:2980-2988. [18]HUANG D,CHEN Z,FENG X.Object detection method based on graph convolution net under limitedsamples[J].Journal of Chongqing University of Technology(Natural Science),2022,36(6):172-180. [19]ANKUSH G,ANDREA V,ANDREW Z.Synthetic Data forText Localisation in Natural Images[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:2315-2324. [20]NIBAL N,FEI Y,IMEN B,et al.ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification-RRC-MLT[C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition.Kyoto:2017:1454-1459. [21]LIU Y L,JIN L W,ZHANG S T,et al.Detecting Curve Text in the Wild:New Dataset and NewSolution[J].arXiv:1712.02170,2017. [22]LYU P Y,YAO C,WU W H,et al.Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition,Salt Lake City.Utah:IEEE,2018:7553-7563. [23]DENG D,LIU H F,LI X L,et al.PixelLink:Detecting Scene Text via Instance Segmentation[C]//Proceedings of the 32th AAAI Conference on Artificial Intelligence.New Orleans,Louisiana:2017:6773-6780. [24]WANG W H,XIE E Z,SONG X G,et al.Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network[C]//Proceedings of the 2019 IEEE International Confe-rence on Computer Vision.Seoul:IEEE,2019:8439-8448. [25]FENG W,HE W H,YIN F,et al.TextDragon:An End-to-End Framework for Arbitrary Shaped Text Spotting[C]//Procee-dings of the 2019 IEEE International Conference on Computer Vision.Seoul:IEEE,2019:9075-9084. [26]XU Y C,WANG Y K,ZHOU W,et al.TextField:Learning a Deep Direction Field for Irregular Scene Text Detection[J].ar-Xiv:1812.01393,2018. [27]WANG W H,XIE E Z,LI X,et al.Shape Robust Text Detection With Progressive Scale Expansion Network[C]//Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA:IEEE,2019:9336- 9345. [28]RICHARDSON E,AZAR Y,AVIOZ O,et al.It's All About The Scale-Efficient Text Detection Using Adaptive Scaling[C]//Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision.Aspen,Colorado:IEEE,2020:1844- 1853. [29]ZHANG L,LIU Y,XIAO H,et al.Efficient Scene Text Detection with Textual Attention Tower[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).Barcelona:IEEE,2020:4272-4276. [30]LIAO M,WAN Z,YAO C,et al.Real-Time Scene Text Detection with Differentiable Binarization[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence.New York:2020:11474-11481. [31]SHAO H L,JI Y,LIU C P,et al.Scene Text Detection Algorithm Based on Enhanced Feature Pyramid Network[J].Computer Science,2022,49(2):248-255. [32]XUE C H,LU S J,ZHANG W.MSR:Multi-Scale Shape Regression for Scene Text Detection[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence.Macao:2019:989-995. [33]SHI B G,BAI X,SERGE J B.Detecting Oriented Text in Natural Images by Linking Segments[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,Hawaii:IEEE,2017:3482-3490. [34]LIAO M H,ZHU Z,SHI B G,et al.Rotation-Sensitive Regression for Oriented Scene Text Detection[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,Utah:IEEE,2018:5905-5918. [35]WANG Y X,XIE H T,ZHA Z J,et al.ContourNet:Taking a Further Step toward Accurate Arbitrary- shaped Scene Text Detection[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2020:11750-11759. [36]XIE B H,QIN Y L,ZHANG Y J.Scene Text Detection Based on Learning Active Center ContourModel[J].Computer Engineering,2022,48(3):244-252,262. |
[1] | 王晓飞, 樊学强, 李章维. 基于迁移学习和多视图特征融合提高RNA碱基相互作用预测 Improving RNA Base Interactions Prediction Based on Transfer Learning and Multi-view Feature Fusion 计算机科学, 2023, 50(3): 164-172. https://doi.org/10.11896/jsjkx.211200186 |
[2] | 董永峰, 黄港, 薛婉若, 李林昊. 融合IRT的图注意力深度知识追踪模型 Graph Attention Deep Knowledge Tracing Model Integrated with IRT 计算机科学, 2023, 50(3): 173-180. https://doi.org/10.11896/jsjkx.211200134 |
[3] | 华晓凤, 冯娜, 于俊清, 何云峰. 基于规则推理的足球视频任意球射门事件检测 Shooting Event Detection of Free Kick in Soccer Video Based on Rule Reasoning 计算机科学, 2023, 50(3): 181-190. https://doi.org/10.11896/jsjkx.220300062 |
[4] | 梅鹏程, 杨吉斌, 张强, 黄翔. 一种基于三维卷积的声学事件联合估计方法 Sound Event Joint Estimation Method Based on Three-dimension Convolution 计算机科学, 2023, 50(3): 191-198. https://doi.org/10.11896/jsjkx.220500259 |
[5] | 白雪飞, 马亚楠, 王文剑. 基于特征融合的边缘引导乳腺超声图像分割方法 Segmentation Method of Edge-guided Breast Ultrasound Images Based on Feature Fusion 计算机科学, 2023, 50(3): 199-207. https://doi.org/10.11896/jsjkx.211200294 |
[6] | 刘航, 普园媛, 吕大华, 赵征鹏, 徐丹, 钱文华. 极化自注意力约束颜色溢出的图像自动上色 Polarized Self-attention Constrains Color Overflow in Automatic Coloring of Image 计算机科学, 2023, 50(3): 208-215. https://doi.org/10.11896/jsjkx.220100149 |
[7] | 陈亮, 王璐, 李生春, 刘昌宏. 基于深度学习的可视化仪表板生成技术研究 Study on Visual Dashboard Generation Technology Based on Deep Learning 计算机科学, 2023, 50(3): 238-245. https://doi.org/10.11896/jsjkx.230100064 |
[8] | 张译, 吴秦. 特征增强损失与前景注意力人群计数网络 Crowd Counting Network Based on Feature Enhancement Loss and Foreground Attention 计算机科学, 2023, 50(3): 246-253. https://doi.org/10.11896/jsjkx.220100219 |
[9] | 应宗浩, 吴槟. 深度学习模型的后门攻击研究综述 Backdoor Attack on Deep Learning Models:A Survey 计算机科学, 2023, 50(3): 333-350. https://doi.org/10.11896/jsjkx.220600031 |
[10] | 邹芸竹, 杜圣东, 滕飞, 李天瑞. 一种基于多模态深度特征融合的视觉问答模型 Visual Question Answering Model Based on Multi-modal Deep Feature Fusion 计算机科学, 2023, 50(2): 123-129. https://doi.org/10.11896/jsjkx.211200303 |
[11] | 王鹏宇, 台文鑫, 刘芳, 钟婷, 罗绪成, 周帆. 基于数据增强的自监督飞行航迹预测 Self-supervised Flight Trajectory Prediction Based on Data Augmentation 计算机科学, 2023, 50(2): 130-137. https://doi.org/10.11896/jsjkx.211200016 |
[12] | 郭楠, 李婧源, 任曦. 基于深度学习的刚体位姿估计方法综述 Survey of Rigid Object Pose Estimation Algorithms Based on Deep Learning 计算机科学, 2023, 50(2): 178-189. https://doi.org/10.11896/jsjkx.211200164 |
[13] | 华杰, 刘学亮, 赵烨. 基于特征融合的小样本目标检测 Few-shot Object Detection Based on Feature Fusion 计算机科学, 2023, 50(2): 209-213. https://doi.org/10.11896/jsjkx.220500153 |
[14] | 梁佳利, 华保健, 苏少博. 融合循环划分的张量指令生成优化 Tensor Instruction Generation Optimization Fusing with Loop Partitioning 计算机科学, 2023, 50(2): 374-383. https://doi.org/10.11896/jsjkx.220300147 |
[15] | 蔡肖, 陈志华, 盛斌. 基于移位窗口金字塔Transformer的遥感图像目标检测 SPT:Swin Pyramid Transformer for Object Detection of Remote Sensing 计算机科学, 2023, 50(1): 105-113. https://doi.org/10.11896/jsjkx.211100208 |
|