计算机科学 ›› 2020, Vol. 47 ›› Issue (11): 199-204.doi: 10.11896/jsjkx.190800145
李宗民1, 李思远1, 刘玉杰1, 李华2
LI Zong-min1, LI Si-yuan1, LIU Yu-jie1, LI Hua2
摘要: 针对手绘图像检索领域中手绘图像的特征稀疏、手绘本身易于形变等问题,文中提出了一种基于注意力模型的特征提取方法,通过精确提取手绘图像中的语义特征来获得高效准确的检索结果。首先使用卷积神经网络作为提取语义特征的基础框架;然后在有监督训练的过程中引入了注意力模型机制,通过在卷积神经网络的最后一层卷积层后引入注意力结构块的方法来定位出有效的语义特征,其中注意力结构块由空间注意力结构和通道注意力结构联合组成;最后通过融合不同层次的语义特征形成最终的特征描述子,达到高精度的检索,在基准数据库Flickr15k上的实验结果表明所提方法是可行有效的。此外,在手绘图像分类任务中,提出的注意力机制大幅提高了分类精度。
中图分类号:
[1] EITZ M,HAYS J,AlEXA M.How do humans sketch objects?[J].Acm Transactions on Graphics,2012,31(4):44. [2] HU R,COLLOMOSSE J.A performance evaluation of gradient field hog descriptor for sketch based image retrieval[J].Computer Vision and Image Understanding,2013,117(7):790-806. [3] EITZ M,HILDEBLAND K,BOUBEKEUR T,et al.A descriptor for large scale image retrieval based on sketched feature lines[C]//Proceedings of Eurographics Symposium on Sketch-based Interfaces & Modeling.ACM,2009:29-36. [4] HU R,BARNARD M,COLLOMOSSE J P.Gradient field de-scriptor for sketch based retrieval and localization[C]//Procee-dings of IEEE International Conference on Image Processing.IEEE,2010:1025-1028. [5] EITZ M,HILDEBRAND K,BOUBEKEUR T,et al.Sketch-Based Image Retrieval:Benchmark and Bag-of-Features Descriptors[J].IEEE Transactions on Visualization and Computer Graphics,2011,17(11):1624-1636. [6] LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110. [7] DALAL N,TRIGGS B.Histograms of oriented gradients forhuman detection[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2005:886-893. [8] YU Q,SONG Y Z,ZHANG H,et al.Sketch-based image retrieval via Siamese convolutional neural network[C]//Procee-dings of IEEE International Conference on Image Processing.IEEE Computer Society Press,2016. [9] WANG X,DUAN X,BAI X.Deep Sketch Feature for Cross-domain Image Retrieval[J].Neurocomputing 2016,207:387-397. [10] LIU Y J,YU D,PANG Y P.Sketch Based Image Retrival Based on Multi-layer Semantic Feature and Deep Convoluntional Neural Network[J].Journal of Computer-Aided Design and Computer Graphics,2018,30(4):651-657. [11] LIU Y J,PANG Y P,LU Z Q,et al.Sketch Based Image Retrieval Based on Chamfer Distance Transform and Bag of Mid Maps Descriptor [J].Journal of Computer-Aided Design & Computer Graphics,2016,28(12):2168-2174. [12] BAI X,LI Q,LATECKI L J,et al.Shape band:A deformable object detection approach[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009. [13] MORI G,BELONGIE S,MALIK J.Efficient shape matchingusing shape contexts[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(11):1832-1837. [14] THAYANANTHAN A,STENGER B,TORR P H S,et al.Shape context and chamfer matching in cluttered scenes[J].IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2003,1:127-133. [15] XIA G S,DELON J,GOUSSEAU Y.Shape-based InvariantTexture Indexing[J].International Journal of Computer Vision,2010,88(3):382-403. [16] TOLIAS G,CHUM O.Efficient Contour Match Kernel[J].Image & Vision Computing,2018,76:14-26. [17] LIU Y J,DOU C H,ZHAO Q L.Sketch Based Image Retrival with Conditional Generative Adversarial Network[J].Journal of Computer-Aided Design and Computer Graphics,2017,29(12):2336-2342. [18] BUI T,RIBEIRO L,PONTI M,et al.Sketching out the details:Sketch-based image retrieval using convolutional neural networks with multi-stage regression[J].Computers & Graphics,2018,71:77-87. [19] LU J,XIONG C,PARIKH D,et al.Knowing when to look:Adaptive attention via a visual sentinel for image captioning[EB/OL].[2016-02-06].https://arxiv.org/abs/1612.01887. [20] MNIH V,HEESS N,GRAVES A,et al.Recurrent Models of Visual Attention[J].Advances in neural information processing systems,2014,2:2204-2212. [21] NOH H,ARAUJO A,SIM J,et al.Large-Scale Image Retrieval with Attentive Deep Local Features[C]//Proceedings of IEEE International Conference on Computer Vision (ICCV).IEEE Computer Society,2017. [22] XIAO N T,XU N Y,YANGN K,et al.The application of two-level attention models in deep convolutional neural network for fine-grained image classification[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE Computer Society,2015:2. [23] HU J,LI S,ALBANIE S,et al.Squeeze-and-excitation networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,99:1-1. [24] WOO S,PARK J,LEE J Y,et al.Convolutional block attention module[C]//Proceedings of the EuropeanConference on Computer Vision (ECCV).2018:3-19. [25] SONG J,YU Q,SONG Y Z,et al.Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval[C]//IEEE International Conference on Computer Vision (ICCV).IEEE Computer Society,2017. [26] SIMONYAN K,ZISSERMN A.Very deep convolutional net-works for large-scale image recognition[EB/OL].[2017-06-15].https://arxiv.org/abs/1409.1556. [27] CHOPRA S,HADSELL R,LECCUN Y.Learning a similarity metric discriminatively,with application to face verification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE Computer Society,2005:539-546. [28] YU Q,YANG Y,SONG Y Z,et al.Sketch-a-net that beats humans[EB/OL].[2017-06-15].https://arxiv.org/abs/1501.07873. [29] JOLY A,BUISSON O.Logo retrieval with a contrario visualquery expansion[C]//International Conference on Multimedia.2009. [30] LI Y,HOSPEDALES T M,SONG Y Z,et al.Free-hand sketch recognition by multi-kernel feature learning[J].Computer Vision and Image Understanding,2015,137:1-11. [31] SCHNEIDER,ROSALIA G,TUYTELAARS T.Sketch classification and classification-driven analysis using Fisher vectors[J].ACM Transactions on Graphics,2014,33(6):1-9. [32] ZHONG Y,ZHANG H G,GUO J S,et al.Directional Element HOG for Sketch Recognition[C]//International Conference on Network Infrastructure and Digital Content (IC-NIDC).2018. [33] PRABHU A,BATCHU V,GAJAWADA R,et al.Hybrid Binary Networks:Optimizing for Accuracy,Efficiency and Memory[C]//IEEE Winter Conference on Applications of Computer Vision (WACV).2018,10:821-829. [34] MISHRA,SINGH A K.Deep Embedding using Bayesian RiskMinimization with Application to Sketch Recognition[EB/OL].[2018-12-6].https://arxiv.org/abs/1812.02466. [35] LI L,ZOU C,ZHENG Y,et al.Sketch-R2CNN:An Attentive Network for Vector Sketch Recognition[EB/OL].[2018-11-20].https://arxiv.org/abs/1811.08170. |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 吴子仪, 李邵梅, 姜梦函, 张建朋. 基于自注意力模型的本体对齐方法 Ontology Alignment Method Based on Self-attention 计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190 |
[3] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[4] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[5] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[6] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[7] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[8] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[9] | 刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179 |
[10] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
[11] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[12] | 杨玥, 冯涛, 梁虹, 杨扬. 融合交叉注意力机制的图像任意风格迁移 Image Arbitrary Style Transfer via Criss-cross Attention 计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236 |
[13] | 杨健楠, 张帆. 一种结合双注意力机制和层次网络结构的细碎农作物分类方法 Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure 计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169 |
[14] | 杨涵, 万游, 蔡洁萱, 方铭宇, 吴卓超, 金扬, 钱伟行. 基于步态分类辅助的虚拟IMU的行人导航方法 Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification 计算机科学, 2022, 49(6A): 759-763. https://doi.org/10.11896/jsjkx.211200148 |
[15] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
|