基于注意力和视觉语义推理的枸杞虫害检索

doi:10.11896/jsjkx.211200087

摘要/Abstract

摘要： 针对传统作物虫害检索模态单一的问题,将注意力与视觉语义推理相结合,对常见的17种枸杞虫害进行图文跨模态检索研究。首先利用Faster R-CNN+ResNet101实现注意力机制来提取枸杞虫害图像局部细粒度信息;接着,引入视觉语义推理,建立图像区域连接并采用图卷积网络(GCN)进行区域关系推理来增强区域表示;然后,进一步进行全局语义推理,选择具有判别性的特征,过滤掉不重要的内容,以捕获更多的关键语义信息;最后通过模态交互深入挖掘枸杞虫害图像和文本不同模态间的语义关联。在自建的枸杞虫害数据集上,采用平均准确率均值(MAP)作为评价指标对所提方法进行对比实验和消融实验。实验结果表明,图检文和文检图的平均MAP值达到了0.522,与8种主流方法相比提升了0.048~0.244,具有更好的检索效果。

关键词: 跨模态检索, 注意力机制, 细粒度, 视觉语义推理, 枸杞虫害

Abstract: Aiming at the problem that traditional retrieval model on pest has a single mode,this paper uses a cross-modal retrieval method for 17 kinds of common lycium pests in image and text modal,which integrates attention mechanism and visual semantic reasoning.First,use Faster R-CNN+ResNet101 to realize the attention mechanism to extract local fine-grained information of wolfberry pest images.Then further introduce vision semantic reasoning to build the image region connections and use convolutional network GCN for region relation reasoning to enhance area representation.In addition,global semantic reasoning is performed by enhancing semantic correlation between regions,selecting discriminant features and filtering out unimportant information to capture more key semantic information.Finally,the semantic association between different modalities of lycium barbarum pest image and text is deeply explored through modal interaction.On the self-built lycium barbarum pest dataset,the average accuracy(MAP) is used as the evaluation index to carry out comparative experiment and ablation experiment.Experimental results demonstrate that the averaged MAP of the proposed method in the self-built lycium pest dataset achieves 0.522,compared with the eight mainstream methods,the average MAP of the method improves by 0.048 to 0.244,and it has better retrieval effect.

Key words: Cross-modal retrieval, Attention mechanism, Fine-grained, Visual semantic reasoning, Lycium barbarum pest

中图分类号:

TP391

韩会珍, 刘立波. 基于注意力和视觉语义推理的枸杞虫害检索[J]. 计算机科学, 2022, 49(11A): 211200087-6. https://doi.org/10.11896/jsjkx.211200087

HAN Hui-zhen, LIU Li-bo. Lycium Barbarum Pest Retrieval Based on Attention and Visual Semantic Reasoning[J]. Computer Science, 2022, 49(11A): 211200087-6. https://doi.org/10.11896/jsjkx.211200087

参考文献

[1]HE H X.Research on the Pain Points and Paths of My Country’sSmart Agriculture Development in the Internet Era[J].Agricultural Economics,2021(6):15-17.
[2]LIU Q P.Occurrence and control techniques of major diseases and insect pests of Chinese wolfberry[J].Modern Horticulture,2018(12):42.
[3]CHANG X.Agricultural and forestry pests and diseases and me-teorological information remote monitoring system [D].Beijing:Beijing University of Technology,2020.
[4]FAN Z J.Research and implementation of image retrieval me-thods for crop diseases and insect pests[D].Mianyang:Southwest University of Science and Technology,2018.
[5]CHEN Z F.Research on Semantic Retrieval of Forestry Diseases and Pests Domain Ontology [D].Harbin:Northeast Forestry University,2017.
[6]LI G F,LI W J.A semantic retrieval model based on the domain ontology of wolfberry diseases and insect pests[J].Computer Technology and Development,2017,27(9):48-52.
[7]OU W H,LIU B,ZHOU Y H,et al.A review of cross-modal retrieval research[J].Journal of Guizhou Normal University(Na-tural Science Edition),2018,36(2):118-124.
[8]GONG Y,KE Q,ISARDM,et al.A multi-view embedding space for modeling internet images,tags,and their semantics[J].International Journal of Computer Vision,2014,106(2):210-233.
[9]ZHAI X,PENG Y,XIAO J.Learning cross-media joint representation with sparse and semisupervised regularization[J].IEEE Transactions on Circuits and Systems for Video Techno-logy,2013,24(6):965-978.
[10]RASIWASIA N,COSTA PEREIRA J,COVIELLO E,et al.Anew approach to cross-modal multimedia retrieval[C]//Procee-dings of the 18th ACM International Conference on Multi-media.New York:ACM Press,2010:251-260.
[11]RASHTCHIAN C,YOUNG P,HODOSH M,et al.Collecting image annotations using amazon’s mechanical turk[C]//Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk.Stroudsburg:Association for Computational Linguistics Press,2010:139-147.
[12]RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
[13]HOTELLING H.Relations between two sets of variates[M]//Kotz S,Johnson N L Breakthroughs in statistics.New York:Springer Press,1992:162-190.
[14]ANDREW G,ARORA R,BILMES J,et al.Deep canonical correlation analysis[C]//Proceedings of the 30th International Conference on Machine Learning.Atlanta:Machine Learning Research Press,2013:1247-1255.
[15]WANG B,YANG Y,XU X,et al.Adversarial cross-modal re-trieval[C]//Proceedings of the 25th ACM International Confe-rence on Multimedia.New York:ACM Press,2017:154-162.
[16]ZHEN L,HU P,WANG X,et al.Deep supervised cross-modal retrieval[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2019:10394-10403.
[17]HUANG Y,WU Q,SONG C F,et al.Learning semantic concepts and order for image and sentence matching[C]//CVPR.2018.
[18]LEE K H,CHEN X,HUA G,et al..Stacked cross attention for image-text matching[C]//ECCV.2018.
[19]LI Z X,LING F,ZHANG C L,et al.Cross-media image text retrieval based on two-level similarity [J].Chinese Journal of Electronics,2021,49(2):268-274.
[20]LI K,ZHANG Y,LI K,et al.Visual Semantic Reasoning forImage-Text Matching[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV).2019:4653-4661.
[21]WEI J,ZOU K.EDA:Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks[J].arXiv:1901.11196,2019.
[22]HARDOON D R,SZEDMAK S,SHAWE-TAYLOR J.Canonical correlation analysis:An overview with application to learning methods[J].Neural Computation,2004,16(12):2639-2664.
[23]ZHAI X,PENG Y,XIAO J.Learning cross-media joint representation with sparse and semisupervised regularization[J].IEEE Transactions on Circuits and Systems for Video Techno-logy,2013,24(6):965-978.
[24]WEI Y,ZHAO Y,LU C,et al.Cross-modal retrieval with CNN visual features:A new baseline[J].IEEE transactions on cybernetics,2016,47(2):449-460.
[25]WANG X,HU P,ZHEN L,et al.DRSL:Deep Relational Similarity Learning for Cross-modal Retrieval[J].Information Sciences,2021,546:298-311.

相关文章 15

[1]	周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2]	戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[3]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[4]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[5]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[6]	汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[7]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[8]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[9]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[10]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[11]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[12]	曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[13]	徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[14]	孟月波, 穆思蓉, 刘光辉, 徐胜军, 韩九强. 基于向量注意力机制GoogLeNet-GMP的行人重识别方法 Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism 计算机科学, 2022, 49(7): 142-147. https://doi.org/10.11896/jsjkx.210600198
[15]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed