Computer Science ›› 2022, Vol. 49 ›› Issue (11A): 211200087-6.doi: 10.11896/jsjkx.211200087

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Lycium Barbarum Pest Retrieval Based on Attention and Visual Semantic Reasoning

HAN Hui-zhen, LIU Li-bo   

  1. School of Information Engineering,Ningxia University,Yinchuan 750021,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:HAN Hui-zhen,born in 1995,postgra-duate.Her main research interests include information retrieval and so on.
    LIU Li-bo,born in 1974,Ph.D,professor,is a member of China Computer Federation.Her main research interests include intelligent information proces-sing and so on.
  • Supported by:
    National Natural Science Foundation of China(61862050) and Ningxia Natural Science Fundation of China(2020AAC03031).

Abstract: Aiming at the problem that traditional retrieval model on pest has a single mode,this paper uses a cross-modal retrieval method for 17 kinds of common lycium pests in image and text modal,which integrates attention mechanism and visual semantic reasoning.First,use Faster R-CNN+ResNet101 to realize the attention mechanism to extract local fine-grained information of wolfberry pest images.Then further introduce vision semantic reasoning to build the image region connections and use convolutional network GCN for region relation reasoning to enhance area representation.In addition,global semantic reasoning is performed by enhancing semantic correlation between regions,selecting discriminant features and filtering out unimportant information to capture more key semantic information.Finally,the semantic association between different modalities of lycium barbarum pest image and text is deeply explored through modal interaction.On the self-built lycium barbarum pest dataset,the average accuracy(MAP) is used as the evaluation index to carry out comparative experiment and ablation experiment.Experimental results demonstrate that the averaged MAP of the proposed method in the self-built lycium pest dataset achieves 0.522,compared with the eight mainstream methods,the average MAP of the method improves by 0.048 to 0.244,and it has better retrieval effect.

Key words: Cross-modal retrieval, Attention mechanism, Fine-grained, Visual semantic reasoning, Lycium barbarum pest

CLC Number: 

  • TP391
[1]HE H X.Research on the Pain Points and Paths of My Country’sSmart Agriculture Development in the Internet Era[J].Agricultural Economics,2021(6):15-17.
[2]LIU Q P.Occurrence and control techniques of major diseases and insect pests of Chinese wolfberry[J].Modern Horticulture,2018(12):42.
[3]CHANG X.Agricultural and forestry pests and diseases and me-teorological information remote monitoring system [D].Beijing:Beijing University of Technology,2020.
[4]FAN Z J.Research and implementation of image retrieval me-thods for crop diseases and insect pests[D].Mianyang:Southwest University of Science and Technology,2018.
[5]CHEN Z F.Research on Semantic Retrieval of Forestry Diseases and Pests Domain Ontology [D].Harbin:Northeast Forestry University,2017.
[6]LI G F,LI W J.A semantic retrieval model based on the domain ontology of wolfberry diseases and insect pests[J].Computer Technology and Development,2017,27(9):48-52.
[7]OU W H,LIU B,ZHOU Y H,et al.A review of cross-modal retrieval research[J].Journal of Guizhou Normal University(Na-tural Science Edition),2018,36(2):118-124.
[8]GONG Y,KE Q,ISARDM,et al.A multi-view embedding space for modeling internet images,tags,and their semantics[J].International Journal of Computer Vision,2014,106(2):210-233.
[9]ZHAI X,PENG Y,XIAO J.Learning cross-media joint representation with sparse and semisupervised regularization[J].IEEE Transactions on Circuits and Systems for Video Techno-logy,2013,24(6):965-978.
[10]RASIWASIA N,COSTA PEREIRA J,COVIELLO E,et al.Anew approach to cross-modal multimedia retrieval[C]//Procee-dings of the 18th ACM International Conference on Multi-media.New York:ACM Press,2010:251-260.
[11]RASHTCHIAN C,YOUNG P,HODOSH M,et al.Collecting image annotations using amazon’s mechanical turk[C]//Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk.Stroudsburg:Association for Computational Linguistics Press,2010:139-147.
[12]RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
[13]HOTELLING H.Relations between two sets of variates[M]//Kotz S,Johnson N L Breakthroughs in statistics.New York:Springer Press,1992:162-190.
[14]ANDREW G,ARORA R,BILMES J,et al.Deep canonical correlation analysis[C]//Proceedings of the 30th International Conference on Machine Learning.Atlanta:Machine Learning Research Press,2013:1247-1255.
[15]WANG B,YANG Y,XU X,et al.Adversarial cross-modal re-trieval[C]//Proceedings of the 25th ACM International Confe-rence on Multimedia.New York:ACM Press,2017:154-162.
[16]ZHEN L,HU P,WANG X,et al.Deep supervised cross-modal retrieval[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2019:10394-10403.
[17]HUANG Y,WU Q,SONG C F,et al.Learning semantic concepts and order for image and sentence matching[C]//CVPR.2018.
[18]LEE K H,CHEN X,HUA G,et al..Stacked cross attention for image-text matching[C]//ECCV.2018.
[19]LI Z X,LING F,ZHANG C L,et al.Cross-media image text retrieval based on two-level similarity [J].Chinese Journal of Electronics,2021,49(2):268-274.
[20]LI K,ZHANG Y,LI K,et al.Visual Semantic Reasoning forImage-Text Matching[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV).2019:4653-4661.
[21]WEI J,ZOU K.EDA:Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks[J].arXiv:1901.11196,2019.
[22]HARDOON D R,SZEDMAK S,SHAWE-TAYLOR J.Canonical correlation analysis:An overview with application to learning methods[J].Neural Computation,2004,16(12):2639-2664.
[23]ZHAI X,PENG Y,XIAO J.Learning cross-media joint representation with sparse and semisupervised regularization[J].IEEE Transactions on Circuits and Systems for Video Techno-logy,2013,24(6):965-978.
[24]WEI Y,ZHAO Y,LU C,et al.Cross-modal retrieval with CNN visual features:A new baseline[J].IEEE transactions on cybernetics,2016,47(2):449-460.
[25]WANG X,HU P,ZHEN L,et al.DRSL:Deep Relational Similarity Learning for Cross-modal Retrieval[J].Information Sciences,2021,546:298-311.
[1] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[2] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[3] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[4] XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182.
[5] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[6] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[7] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[8] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[9] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[10] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[11] ZHANG Ying-tao, ZHANG Jie, ZHANG Rui, ZHANG Wen-qiang. Photorealistic Style Transfer Guided by Global Information [J]. Computer Science, 2022, 49(7): 100-105.
[12] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[13] XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[14] MENG Yue-bo, MU Si-rong, LIU Guang-hui, XU Sheng-jun, HAN Jiu-qiang. Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism [J]. Computer Science, 2022, 49(7): 142-147.
[15] JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!