计算机科学 ›› 2020, Vol. 47 ›› Issue (11): 199-204.doi: 10.11896/jsjkx.190800145

• 计算机图形学&多媒体 • 上一篇    下一篇

基于注意力模型的手绘图像检索方法

李宗民1, 李思远1, 刘玉杰1, 李华2   

  1. 1 中国石油大学(华东)计算机与通信工程学院 山东 青岛 266580
    2 中国科学院计算技术研究所 北京 100190
  • 收稿日期:2019-08-28 修回日期:2019-12-16 出版日期:2020-11-15 发布日期:2020-11-05
  • 通讯作者: 李思远(875031416@qq.com)
  • 作者简介:lizongmin@upc.edu.cn
  • 基金资助:
    国家自然科学基金(61379106,61379082,61227802);山东省自然科学基金(ZR2013FM036,ZR2015FM011)

Sketch-based Image Retrieval Based on Attention Model

LI Zong-min1, LI Si-yuan1, LIU Yu-jie1, LI Hua2   

  1. 1 College of Computer & Communication Engineering,China University of Petroleum,Qingdao,Shandong 266580,China
    2 Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China
  • Received:2019-08-28 Revised:2019-12-16 Online:2020-11-15 Published:2020-11-05
  • About author:LI Zong-min,born in 1965,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include computer graphics,image processing,and scienti-fic computing visualization.
    LI Si-yuan,born in 1996,postgraduate.His main research interests include computer vision,image processing,ima-ge retrieval,and sketch image recognition.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61379106,61379082,61227802) and Natural Science Foundation of Shandong Province (ZR2013FM036,ZR2015FM011).

摘要: 针对手绘图像检索领域中手绘图像的特征稀疏、手绘本身易于形变等问题,文中提出了一种基于注意力模型的特征提取方法,通过精确提取手绘图像中的语义特征来获得高效准确的检索结果。首先使用卷积神经网络作为提取语义特征的基础框架;然后在有监督训练的过程中引入了注意力模型机制,通过在卷积神经网络的最后一层卷积层后引入注意力结构块的方法来定位出有效的语义特征,其中注意力结构块由空间注意力结构和通道注意力结构联合组成;最后通过融合不同层次的语义特征形成最终的特征描述子,达到高精度的检索,在基准数据库Flickr15k上的实验结果表明所提方法是可行有效的。此外,在手绘图像分类任务中,提出的注意力机制大幅提高了分类精度。

关键词: 卷积神经网络, 手绘分类, 手绘检索, 注意力模型

Abstract: To solve the problems of the sparse features and the geometric distortion of hand-drawn images in the research field of SBIR (sketch based image retrieval),a new feature extraction method based on attention model is proposed in this paper.The retrieval results can be obtained efficiently and accurately by accurately extracting the semantic features of hand-drawn images.Firstly,convolutional neural network is used as the basic framework for extracting semantic features,and then the supervised training process is carried out.Attention model mechanism is introduced to locate effective semantic features by adding attention block after the last convolution layer of the convolution neural network,and the attention block is composed of spatial attention structure and channel attention structure.Finally,the final feature descriptor is formed by the fusion of semantic features in different layers,to realize high retrieval accuracy.The experimental results on benchmark Flickr15k dataset proves the feasibility and effectiveness of the proposed method.In addition,the proposed attention model can greatly improve the classification accuracy in the task of sketch classification.

Key words: Attention model, Convolutional neural network, Sketch classification, Sketch-based image retrieval

中图分类号: 

  • TP391.41
[1] EITZ M,HAYS J,AlEXA M.How do humans sketch objects?[J].Acm Transactions on Graphics,2012,31(4):44.
[2] HU R,COLLOMOSSE J.A performance evaluation of gradient field hog descriptor for sketch based image retrieval[J].Computer Vision and Image Understanding,2013,117(7):790-806.
[3] EITZ M,HILDEBLAND K,BOUBEKEUR T,et al.A descriptor for large scale image retrieval based on sketched feature lines[C]//Proceedings of Eurographics Symposium on Sketch-based Interfaces & Modeling.ACM,2009:29-36.
[4] HU R,BARNARD M,COLLOMOSSE J P.Gradient field de-scriptor for sketch based retrieval and localization[C]//Procee-dings of IEEE International Conference on Image Processing.IEEE,2010:1025-1028.
[5] EITZ M,HILDEBRAND K,BOUBEKEUR T,et al.Sketch-Based Image Retrieval:Benchmark and Bag-of-Features Descriptors[J].IEEE Transactions on Visualization and Computer Graphics,2011,17(11):1624-1636.
[6] LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
[7] DALAL N,TRIGGS B.Histograms of oriented gradients forhuman detection[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2005:886-893.
[8] YU Q,SONG Y Z,ZHANG H,et al.Sketch-based image retrieval via Siamese convolutional neural network[C]//Procee-dings of IEEE International Conference on Image Processing.IEEE Computer Society Press,2016.
[9] WANG X,DUAN X,BAI X.Deep Sketch Feature for Cross-domain Image Retrieval[J].Neurocomputing 2016,207:387-397.
[10] LIU Y J,YU D,PANG Y P.Sketch Based Image Retrival Based on Multi-layer Semantic Feature and Deep Convoluntional Neural Network[J].Journal of Computer-Aided Design and Computer Graphics,2018,30(4):651-657.
[11] LIU Y J,PANG Y P,LU Z Q,et al.Sketch Based Image Retrieval Based on Chamfer Distance Transform and Bag of Mid Maps Descriptor [J].Journal of Computer-Aided Design & Computer Graphics,2016,28(12):2168-2174.
[12] BAI X,LI Q,LATECKI L J,et al.Shape band:A deformable object detection approach[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009.
[13] MORI G,BELONGIE S,MALIK J.Efficient shape matchingusing shape contexts[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(11):1832-1837.
[14] THAYANANTHAN A,STENGER B,TORR P H S,et al.Shape context and chamfer matching in cluttered scenes[J].IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2003,1:127-133.
[15] XIA G S,DELON J,GOUSSEAU Y.Shape-based InvariantTexture Indexing[J].International Journal of Computer Vision,2010,88(3):382-403.
[16] TOLIAS G,CHUM O.Efficient Contour Match Kernel[J].Image & Vision Computing,2018,76:14-26.
[17] LIU Y J,DOU C H,ZHAO Q L.Sketch Based Image Retrival with Conditional Generative Adversarial Network[J].Journal of Computer-Aided Design and Computer Graphics,2017,29(12):2336-2342.
[18] BUI T,RIBEIRO L,PONTI M,et al.Sketching out the details:Sketch-based image retrieval using convolutional neural networks with multi-stage regression[J].Computers & Graphics,2018,71:77-87.
[19] LU J,XIONG C,PARIKH D,et al.Knowing when to look:Adaptive attention via a visual sentinel for image captioning[EB/OL].[2016-02-06].https://arxiv.org/abs/1612.01887.
[20] MNIH V,HEESS N,GRAVES A,et al.Recurrent Models of Visual Attention[J].Advances in neural information processing systems,2014,2:2204-2212.
[21] NOH H,ARAUJO A,SIM J,et al.Large-Scale Image Retrieval with Attentive Deep Local Features[C]//Proceedings of IEEE International Conference on Computer Vision (ICCV).IEEE Computer Society,2017.
[22] XIAO N T,XU N Y,YANGN K,et al.The application of two-level attention models in deep convolutional neural network for fine-grained image classification[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE Computer Society,2015:2.
[23] HU J,LI S,ALBANIE S,et al.Squeeze-and-excitation networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,99:1-1.
[24] WOO S,PARK J,LEE J Y,et al.Convolutional block attention module[C]//Proceedings of the EuropeanConference on Computer Vision (ECCV).2018:3-19.
[25] SONG J,YU Q,SONG Y Z,et al.Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval[C]//IEEE International Conference on Computer Vision (ICCV).IEEE Computer Society,2017.
[26] SIMONYAN K,ZISSERMN A.Very deep convolutional net-works for large-scale image recognition[EB/OL].[2017-06-15].https://arxiv.org/abs/1409.1556.
[27] CHOPRA S,HADSELL R,LECCUN Y.Learning a similarity metric discriminatively,with application to face verification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE Computer Society,2005:539-546.
[28] YU Q,YANG Y,SONG Y Z,et al.Sketch-a-net that beats humans[EB/OL].[2017-06-15].https://arxiv.org/abs/1501.07873.
[29] JOLY A,BUISSON O.Logo retrieval with a contrario visualquery expansion[C]//International Conference on Multimedia.2009.
[30] LI Y,HOSPEDALES T M,SONG Y Z,et al.Free-hand sketch recognition by multi-kernel feature learning[J].Computer Vision and Image Understanding,2015,137:1-11.
[31] SCHNEIDER,ROSALIA G,TUYTELAARS T.Sketch classification and classification-driven analysis using Fisher vectors[J].ACM Transactions on Graphics,2014,33(6):1-9.
[32] ZHONG Y,ZHANG H G,GUO J S,et al.Directional Element HOG for Sketch Recognition[C]//International Conference on Network Infrastructure and Digital Content (IC-NIDC).2018.
[33] PRABHU A,BATCHU V,GAJAWADA R,et al.Hybrid Binary Networks:Optimizing for Accuracy,Efficiency and Memory[C]//IEEE Winter Conference on Applications of Computer Vision (WACV).2018,10:821-829.
[34] MISHRA,SINGH A K.Deep Embedding using Bayesian RiskMinimization with Application to Sketch Recognition[EB/OL].[2018-12-6].https://arxiv.org/abs/1812.02466.
[35] LI L,ZOU C,ZHENG Y,et al.Sketch-R2CNN:An Attentive Network for Vector Sketch Recognition[EB/OL].[2018-11-20].https://arxiv.org/abs/1811.08170.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 吴子仪, 李邵梅, 姜梦函, 张建朋.
基于自注意力模型的本体对齐方法
Ontology Alignment Method Based on Self-attention
计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190
[3] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[4] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[5] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[6] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[7] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[8] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[9] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[10] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[11] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[12] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[13] 杨健楠, 张帆.
一种结合双注意力机制和层次网络结构的细碎农作物分类方法
Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure
计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169
[14] 杨涵, 万游, 蔡洁萱, 方铭宇, 吴卓超, 金扬, 钱伟行.
基于步态分类辅助的虚拟IMU的行人导航方法
Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification
计算机科学, 2022, 49(6A): 759-763. https://doi.org/10.11896/jsjkx.211200148
[15] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!