Computer Science ›› 2021, Vol. 48 ›› Issue (1): 190-196.doi: 10.11896/jsjkx.200600076

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Remote Sensing Image Description Generation Method Based on Attention and Multi-scale Feature Enhancement

ZHAO Jia-qi1,2,3, WANG Han-zheng1,2, ZHOU Yong1,2, ZHANG Di1,2, ZHOU Zi-yuan1,2   

  1. 1 School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China
    2 Engineering Research Center of Mine Digitization,Ministry of Education of People's Republic of China,Xuzhou,Jiangsu 221116,China
    3 Innovation Research Center of Disaster Intelligent Prevention and Emergency Rescue,Xuzhou,Jiangsu 221116,China
  • Received:2020-06-12 Revised:2020-11-25 Online:2021-01-15 Published:2021-01-15
  • About author:ZHAO Jia-qi,born in 1988,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include multiobjective optimization,machine learning,deep learning and image processing.
    ZHOU Yong,born in 1974,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include data mining,machine learning and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(61806206),Natural Science Foundation of Jiangsu Province,China(BK20180639) and Opening Project of Science and Technology on Reliability Physics and Application Technology of Electronic Component Laboratory(614280620190403-1).

Abstract: Remote sensing image description generation is a hot research topic involving both computer vision and natural language processing.Its main work is to automatically generate a description sentence for a given image.This paper proposes a remote sensing image description generation method based on multi-scale and attention feature enhancement.The alignment relationship between generated words and image features is realized through soft attention mechanism,which improves the pre-interpretability of the model.In addition,in view of the high resolution of remote sensing images and large changes in target scale,this paper proposes a feature extraction network (Pyramid Pool and Channel Attention Network,PCAN) based on pyramid pooling and channel attention mechanism to capture ofmulti-scale remote sensing image and local cross-channel mutual information.Image features extracted by the model are used as the input to describe the soft attention mechanism of the generation stage,thereby calculating the context information,and then inputting the context information into the LSTM network to obtain the final output sequence.Effectiveness experiments of PCAN and soft attention mechanism on RSICD and MSCOCO datasets prove that the joi-ning of PCAN and soft attention mechanism can improve the quality of generated sentences and realize the alignment between words and image features.Through the visualization analysis of the soft attention mechanism,the credibility of the model results is improved.In addition,experiments on the semantic segmentation dataset prove that the proposed PCAN is also effective for semantic segmentation tasks.

Key words: Attention mechanism, Feature enhancement, Long short-term memory, Remote sensing image description generation

CLC Number: 

  • TP753
[1] LIU J Q,LI Z,ZHANG X Y.Review of Maritime Target Detection in Visible Bands of Optical Remote Sensing Images[J].Computer Science,2020,47(3):116-123.
[2] YIN Y,HUANG H,ZHANG Z X.Research on Ship Detection Technology Based on Optical Remote Sensing Image[J].Computer Science,2019,46(3):82-87.
[3] CUI L,ZHANG P,CHE J.Overview of Deep Neural Network Based Classification Algorithms for Remote Sensing Images[J].Computer Science,2018,45(S1):50-53.
[4] XU K,BA J,KIROS R,et al.Show,attend and tell:Neural image caption generation with visual attention[C]//International Conference on Machine Learning.2015:2048-2057.
[5] MAO J,XU W,YANG Y,et al.Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)[J].arXiv:1412.6632,2014.
[6] LI J W,MONROE W,JURAFSKY D.Understanding neuralnetworks through representation erasure[J].arXiv:1612.08220,2016.
[7] JI S L,LI J F,DU T Y,et al.A Survey on Techniques,Applications and Security of Machine Learning Interpretability[J].Journal of Computer Research and Development,2019,56(10):2071-2096.
[8] KULKARNI G,PREMRAJ V,ORDONEZ V,et al.BabyTalk:Understanding and Generating Simple Image Descriptions[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(12):2891-2903.
[9] SUN C,GAN C,NEVATIA R,et al.Automatic Concept Dis-covery from Parallel Text and Visual Corpora[C]// International Conference on Computer Vision.2015:2596-2604.
[10] LU J S,XIONG C M,PARIKH D,et al.Knowing When to Look:Adaptive Attention via a Visual Sentinel for Image Captioning[C]// 2017 IEEE Conference on Computer vision and pattern recognition.2017:3242-3250.
[11] ANDERSON P,HE X,BUEHLER C,et al.Bottom-Up andTop-Down Attention for Image Captioning and Visual Question Answering[C]// Computer Vision and Pattern Recognition (CVPR).IEEE.2018:6077-6086.
[12] DAS A,KOTTUR S,GUPTA K,et al.Visual Dialog[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2017.
[13] CHAUDHARI S,POLATKAN G,RAMANATH R,et al.An Attentive Survey of Attention Models [J].arXiv:1904.02874,2019.
[14] ROSENBLATT F.The Perceptron:A Probabilistic Model for Information Storage and Organization in the Brain[J].Psychological Review,1958,65(6):386-408.
[15] MNIH V,HEESS N,GRAVES A,et al.Recurrent Models of Visual Attention[C]//Neural Information Processing Systems.2014:2204-2212.
[16] CHEN L,YANG Y,WANG J,et al.Attention to Scale:Scale-Aware Semantic Image Segmentation[C]// Computer Vision and Pattern Recognition.2016:3640-3649.
[17] BAHDANAU D,CHO K,BENGIO Y,et al.Neural Machine Translation by Jointly Learning to Align and Translate[J]. arXiv:1409.0473,2014.
[18] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need [C]//Proc of the 31st Int Conf on Neural Information Processing Systems.USA:Curran Associates Inc.,2017:6000-6010.
[19] YANG Z,YANG D,DYER C,et al.Hierarchical Attention Networks for Document Classification[C]//North American Chapter of the Association for Computational Linguistics.2016:1480-1489.
[20] ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890.
[21] ZAREMBA W,SUTSKEVER I,VINYALS O.Recurrent Neural Network Regularization[J].arXiv:1409.2329,2014.
[22] CHENG G,HAN J,LU X.Remote sensing image scene classification:Benchmark and state of the art[J].Proceedings of the IEEE,2017,105(10):1865-1883.
[23] KISHORE P,SALIM R,TODD W,et al.BLEU:a Method for Automatic Evaluation of Machine Translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.2002:311-318.
[24] DENKOWSKI M,LAVIE A.Meteor Universal:Language Specific Translation Evaluation for Any Target Language[C]// Proceedings of the Ninth Workshop on Statistical Machine Translation.2014:376-380.
[25] LIN C,HOVY E.Automatic evaluation of summaries using N-gram co-occurrence statistics[C]// North American Chapter of the Association for Computational Linguistics.2003:71-78.
[26] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[27] CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv:1706.05587,2017.
[28] CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:801-818.
[29] YU C,WANG J,PENG C,et al.Bisenet:Bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:325-341.
[30] YUAN Y,WANG J.Ocnet:Object context network for scene parsing[J].arXiv:1809.00916,2018.
[31] HUANG Z,WANG X,HUANG L,et al.Ccnet:Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:603-612.
[32] ZHAO H,ZHANG Y,LIU S,et al.Psanet:Point-wise spatial attention network for scene parsing[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:267-283.
[1] LIU Yang, JIN Zhong. Fine-grained Image Recognition Method Combining with Non-local and Multi-region Attention Mechanism [J]. Computer Science, 2021, 48(1): 197-203.
[2] WANG Rui-ping, JIA Zhen, LIU Chang, CHEN Ze-wei, LI Tian-rui. Deep Interest Factorization Machine Network Based on DeepFM [J]. Computer Science, 2021, 48(1): 226-232.
[3] ZHANG Yu-shuai, ZHAO Huan, LI Bo. Semantic Slot Filling Based on BERT and BiLSTM [J]. Computer Science, 2021, 48(1): 247-252.
[4] WANG Run-zheng, GAO Jian, HUANG Shu-hua, TONG Xin. Malicious Code Family Detection Method Based on Knowledge Distillation [J]. Computer Science, 2021, 48(1): 280-286.
[5] CUI Tong-tong, WANG Gui-ling, GAO Jing. Ship Trajectory Classification Method Based on 1DCNN-LSTM [J]. Computer Science, 2020, 47(9): 175-184.
[6] PAN Zu-jiang, LIU Ning, ZHANG Wei, WANG Jian-yong. MTHAM:Multitask Disease Progression Modeling Based on Hierarchical Attention Mechanism [J]. Computer Science, 2020, 47(9): 185-189.
[7] HU Peng-cheng, DIAO Li-li, YE Hua, YANG Yan-lan. DGA Domains Detection Based on Artificial and Depth Features [J]. Computer Science, 2020, 47(9): 311-317.
[8] ZHAO Wei, LIN Yu-ming, WANG Chao-qiang, CAI Guo-yong. Opinion Word-pairs Collaborative Extraction Based on Dependency Relation Analysis [J]. Computer Science, 2020, 47(8): 164-170.
[9] YUAN Ye, HE Xiao-ge, ZHU Ding-kun, WANG Fu-lee, XIE Hao-ran, WANG Jun, WEI Ming-qiang, GUO Yan-wen. Survey of Visual Image Saliency Detection [J]. Computer Science, 2020, 47(7): 84-91.
[10] LIU Yan, WEN Jing. Complex Scene Text Detection Based on Attention Mechanism [J]. Computer Science, 2020, 47(7): 135-140.
[11] YU Yi-lin, TIAN Hong-tao, GAO Jian-wei and WAN Huai-yu. Relation Extraction Method Combining Encyclopedia Knowledge and Sentence Semantic Features [J]. Computer Science, 2020, 47(6A): 40-44.
[12] BAO Zhen-shan, GUO Jun-nan, XIE Yuan and ZHANG Wen-bo. Model for Stock Price Trend Prediction Based on LSTM and GA [J]. Computer Science, 2020, 47(6A): 467-473.
[13] DIAO Li and WANG Ning. Research on Premium Income Forecast Based on X12-LSTM Model [J]. Computer Science, 2020, 47(6A): 512-516.
[14] NI Hai-qing, LIU Dan, SHI Meng-yu. Chinese Short Text Summarization Generation Model Based on Semantic-aware [J]. Computer Science, 2020, 47(6): 74-78.
[15] HUANG Yong-tao, YAN Hua. Scene Graph Generation Model Combining Attention Mechanism and Feature Fusion [J]. Computer Science, 2020, 47(6): 133-137.
Full text



[1] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[2] SUO Yan-feng, WANG Shao-jie, QIN Yu, LI Qiu-xiang, FENG Da-jun and LI Jing-chun. Summary of Security Technology and Application in Industrial Control System[J]. Computer Science, 2018, 45(4): 25 -33 .
[3] LIU Jing-wei, LIU Jing-ju, LU Yu-liang, YANG Bin, ZHU Kai-long. Optimal Defense Strategy Selection Method Based on Network Attack-Defense Game Model[J]. Computer Science, 2018, 45(6): 117 -123 .
[4] LAI Wen-xing, DENG Zhong-min. Improved NSGA2 Algorithm Based on Dominant Strength[J]. Computer Science, 2018, 45(6): 187 -192 .
[5] ZHANG Xiao-hua, HUANG Bo. 3D Geometric Reconstruction Based on Bayes-MeTiS Mesh Partition[J]. Computer Science, 2018, 45(6): 265 -269 .
[6] ZHONG Rui, WU Huai-yu, HE Yun. Fast Face Recognition Algorithm Based on Local Fusion Feature and Hierarchical Incremental Tree[J]. Computer Science, 2018, 45(6): 308 -313 .
[7] LIU Dan, MA Xiu-rong and SHAN Yun-long. Digital Modulation Signal Recognition Method Based on ST-RFT Algorithm[J]. Computer Science, 2018, 45(5): 64 -68 .
[8] CHEN Zhi-xiong, WANG Shi-hui and GAO Rong. Recognition Model of Microblog Opinion Leaders Based on Sentiment Orientation Analysis[J]. Computer Science, 2018, 45(5): 168 -175 .
[9] AI Tuo, LIANG Ya-ling and DU Ming-hui. Improved Faster RCNN Training Method Based on Hard Negative Mining[J]. Computer Science, 2018, 45(5): 250 -254 .
[10] CUI Qian-nan, TIAN Xiao-ping and WU Cheng-mao. Improved Algorithm of Haze Removal Based on Guided Filtering and Dark Channel Prior[J]. Computer Science, 2018, 45(5): 285 -290 .