计算机科学 ›› 2021, Vol. 48 ›› Issue (1): 190-196.doi: 10.11896/jsjkx.200600076

• 计算机图形学与多媒体 • 上一篇    下一篇

基于多尺度与注意力特征增强的遥感图像描述生成方法

赵佳琦1,2,3, 王瀚正1,2, 周勇1,2, 张迪1,2, 周子渊1,2   

  1. 1 中国矿业大学计算机科学与技术学院 江苏 徐州 221116
    2 矿山数字化教育部工程研究中心 江苏 徐州 221116
    3 灾害智能防控与应急救援创新研究中心 江苏 徐州 221116
  • 收稿日期:2020-06-12 修回日期:2020-11-25 出版日期:2021-01-15 发布日期:2021-01-15
  • 通讯作者: 周勇(yzhou@cumt.edu.cn)
  • 作者简介:jiaqizhao@cumt.edu.cn
  • 基金资助:
    国家自然科学基金(61806206);江苏省自然科学基金(BK20180639);电子元器件可靠性物理及其应用技术重点实验室开放基金(614280620190403-1)

Remote Sensing Image Description Generation Method Based on Attention and Multi-scale Feature Enhancement

ZHAO Jia-qi1,2,3, WANG Han-zheng1,2, ZHOU Yong1,2, ZHANG Di1,2, ZHOU Zi-yuan1,2   

  1. 1 School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China
    2 Engineering Research Center of Mine Digitization,Ministry of Education of People's Republic of China,Xuzhou,Jiangsu 221116,China
    3 Innovation Research Center of Disaster Intelligent Prevention and Emergency Rescue,Xuzhou,Jiangsu 221116,China
  • Received:2020-06-12 Revised:2020-11-25 Online:2021-01-15 Published:2021-01-15
  • About author:ZHAO Jia-qi,born in 1988,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include multiobjective optimization,machine learning,deep learning and image processing.
    ZHOU Yong,born in 1974,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include data mining,machine learning and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(61806206),Natural Science Foundation of Jiangsu Province,China(BK20180639) and Opening Project of Science and Technology on Reliability Physics and Application Technology of Electronic Component Laboratory(614280620190403-1).

摘要: 遥感图像描述生成是同时涉及计算机视觉和自然语言处理领域的热门研究话题,其主要工作是对于给定的图像自动地生成一个对该图像的描述语句。文中提出了一种基于多尺度与注意力特征增强的遥感图像描述生成方法,该方法通过软注意力机制实现生成单词与图像特征之间的对齐关系。此外,针对遥感图像分辨率较高、目标尺度变化较大的特点,还提出了一种基于金字塔池化和通道注意力机制的特征提取网络(Pyramid Pool and Channel Attention Network,PCAN),用于捕获遥感图像多尺度以及局部跨通道交互信息。将该模型提取到的图像特征作为描述生成阶段软注意力机制的输入,通过计算得到上下文信息,然后将该上下文信息输入至LSTM网络中,得到最终的输出序列。在RSICD与MSCOCO数据集上对PCAN及软注意力机制进行有效性实验,结果表明,PCAN及软注意力机制的加入能够提升生成语句的质量,实现单词与图像特征之间的对齐。通过对软注意力机制的可视化分析,提高了模型结果的可信度。此外,在语义分割数据集上进行实验,结果表明所提PCAN对于语义分割任务同样具有有效性。

关键词: 注意力机制, 特征增强, 长短期记忆网络, 遥感图像描述生成

Abstract: Remote sensing image description generation is a hot research topic involving both computer vision and natural language processing.Its main work is to automatically generate a description sentence for a given image.This paper proposes a remote sensing image description generation method based on multi-scale and attention feature enhancement.The alignment relationship between generated words and image features is realized through soft attention mechanism,which improves the pre-interpretability of the model.In addition,in view of the high resolution of remote sensing images and large changes in target scale,this paper proposes a feature extraction network (Pyramid Pool and Channel Attention Network,PCAN) based on pyramid pooling and channel attention mechanism to capture ofmulti-scale remote sensing image and local cross-channel mutual information.Image features extracted by the model are used as the input to describe the soft attention mechanism of the generation stage,thereby calculating the context information,and then inputting the context information into the LSTM network to obtain the final output sequence.Effectiveness experiments of PCAN and soft attention mechanism on RSICD and MSCOCO datasets prove that the joi-ning of PCAN and soft attention mechanism can improve the quality of generated sentences and realize the alignment between words and image features.Through the visualization analysis of the soft attention mechanism,the credibility of the model results is improved.In addition,experiments on the semantic segmentation dataset prove that the proposed PCAN is also effective for semantic segmentation tasks.

Key words: Attention mechanism, Feature enhancement, Long short-term memory, Remote sensing image description generation

中图分类号: 

  • TP753
[1] LIU J Q,LI Z,ZHANG X Y.Review of Maritime Target Detection in Visible Bands of Optical Remote Sensing Images[J].Computer Science,2020,47(3):116-123.
[2] YIN Y,HUANG H,ZHANG Z X.Research on Ship Detection Technology Based on Optical Remote Sensing Image[J].Computer Science,2019,46(3):82-87.
[3] CUI L,ZHANG P,CHE J.Overview of Deep Neural Network Based Classification Algorithms for Remote Sensing Images[J].Computer Science,2018,45(S1):50-53.
[4] XU K,BA J,KIROS R,et al.Show,attend and tell:Neural image caption generation with visual attention[C]//International Conference on Machine Learning.2015:2048-2057.
[5] MAO J,XU W,YANG Y,et al.Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)[J].arXiv:1412.6632,2014.
[6] LI J W,MONROE W,JURAFSKY D.Understanding neuralnetworks through representation erasure[J].arXiv:1612.08220,2016.
[7] JI S L,LI J F,DU T Y,et al.A Survey on Techniques,Applications and Security of Machine Learning Interpretability[J].Journal of Computer Research and Development,2019,56(10):2071-2096.
[8] KULKARNI G,PREMRAJ V,ORDONEZ V,et al.BabyTalk:Understanding and Generating Simple Image Descriptions[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(12):2891-2903.
[9] SUN C,GAN C,NEVATIA R,et al.Automatic Concept Dis-covery from Parallel Text and Visual Corpora[C]// International Conference on Computer Vision.2015:2596-2604.
[10] LU J S,XIONG C M,PARIKH D,et al.Knowing When to Look:Adaptive Attention via a Visual Sentinel for Image Captioning[C]// 2017 IEEE Conference on Computer vision and pattern recognition.2017:3242-3250.
[11] ANDERSON P,HE X,BUEHLER C,et al.Bottom-Up andTop-Down Attention for Image Captioning and Visual Question Answering[C]// Computer Vision and Pattern Recognition (CVPR).IEEE.2018:6077-6086.
[12] DAS A,KOTTUR S,GUPTA K,et al.Visual Dialog[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2017.
[13] CHAUDHARI S,POLATKAN G,RAMANATH R,et al.An Attentive Survey of Attention Models [J].arXiv:1904.02874,2019.
[14] ROSENBLATT F.The Perceptron:A Probabilistic Model for Information Storage and Organization in the Brain[J].Psychological Review,1958,65(6):386-408.
[15] MNIH V,HEESS N,GRAVES A,et al.Recurrent Models of Visual Attention[C]//Neural Information Processing Systems.2014:2204-2212.
[16] CHEN L,YANG Y,WANG J,et al.Attention to Scale:Scale-Aware Semantic Image Segmentation[C]// Computer Vision and Pattern Recognition.2016:3640-3649.
[17] BAHDANAU D,CHO K,BENGIO Y,et al.Neural Machine Translation by Jointly Learning to Align and Translate[J]. arXiv:1409.0473,2014.
[18] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need [C]//Proc of the 31st Int Conf on Neural Information Processing Systems.USA:Curran Associates Inc.,2017:6000-6010.
[19] YANG Z,YANG D,DYER C,et al.Hierarchical Attention Networks for Document Classification[C]//North American Chapter of the Association for Computational Linguistics.2016:1480-1489.
[20] ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890.
[21] ZAREMBA W,SUTSKEVER I,VINYALS O.Recurrent Neural Network Regularization[J].arXiv:1409.2329,2014.
[22] CHENG G,HAN J,LU X.Remote sensing image scene classification:Benchmark and state of the art[J].Proceedings of the IEEE,2017,105(10):1865-1883.
[23] KISHORE P,SALIM R,TODD W,et al.BLEU:a Method for Automatic Evaluation of Machine Translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.2002:311-318.
[24] DENKOWSKI M,LAVIE A.Meteor Universal:Language Specific Translation Evaluation for Any Target Language[C]// Proceedings of the Ninth Workshop on Statistical Machine Translation.2014:376-380.
[25] LIN C,HOVY E.Automatic evaluation of summaries using N-gram co-occurrence statistics[C]// North American Chapter of the Association for Computational Linguistics.2003:71-78.
[26] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[27] CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv:1706.05587,2017.
[28] CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:801-818.
[29] YU C,WANG J,PENG C,et al.Bisenet:Bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:325-341.
[30] YUAN Y,WANG J.Ocnet:Object context network for scene parsing[J].arXiv:1809.00916,2018.
[31] HUANG Z,WANG X,HUANG L,et al.Ccnet:Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:603-612.
[32] ZHAO H,ZHANG Y,LIU S,et al.Psanet:Point-wise spatial attention network for scene parsing[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:267-283.
[1] 刘洋, 金忠. 一种结合非局部和多区域注意力机制的细粒度图像识别方法[J]. 计算机科学, 2021, 48(1): 197-203.
[2] 王瑞平, 贾真, 刘畅, 陈泽威, 李天瑞. 基于DeepFM的深度兴趣因子分解机网络[J]. 计算机科学, 2021, 48(1): 226-232.
[3] 张玉帅, 赵欢, 李博. 基于BERT和BiLSTM的语义槽填充[J]. 计算机科学, 2021, 48(1): 247-252.
[4] 王润正, 高见, 黄淑华, 仝鑫. 基于知识蒸馏的恶意代码家族检测方法[J]. 计算机科学, 2021, 48(1): 280-286.
[5] 崔彤彤, 王桂玲, 高晶. 基于1DCNN-LSTM的船舶轨迹分类方法[J]. 计算机科学, 2020, 47(9): 175-184.
[6] 潘祖江, 刘宁, 张伟, 王建勇. 基于层次注意力机制的多任务疾病进展模型[J]. 计算机科学, 2020, 47(9): 185-189.
[7] 胡鹏程, 刁力力, 叶桦, 仰燕兰. 基于人工特征与深度特征的DGA域名检测算法[J]. 计算机科学, 2020, 47(9): 311-317.
[8] 赵威, 林煜明, 王超强, 蔡国永. 基于依赖联系分析的观点词对协同抽取[J]. 计算机科学, 2020, 47(8): 164-170.
[9] 刘燕, 温静. 基于注意力机制的复杂场景文本检测[J]. 计算机科学, 2020, 47(7): 135-140.
[10] 吕亿林, 田宏韬, 高建伟, 万怀宇. 结合百科知识与句子语义特征的关系抽取方法[J]. 计算机科学, 2020, 47(6A): 40-44.
[11] 陈晋音, 蒋焘, 郑海斌. 基于信噪比分级的信号调制类型识别[J]. 计算机科学, 2020, 47(6A): 310-317.
[12] 倪海清, 刘丹, 史梦雨. 基于语义感知的中文短文本摘要生成模型[J]. 计算机科学, 2020, 47(6): 74-78.
[13] 黄勇韬, 严华. 结合注意力机制与特征融合的场景图生成模型[J]. 计算机科学, 2020, 47(6): 133-137.
[14] 朱威, 王图强, 陈悦峰, 何德峰. 基于多尺度残差网络的对象级边缘检测算法[J]. 计算机科学, 2020, 47(6): 144-150.
[15] 张志扬, 张凤荔, 陈学勤, 王瑞锦. 基于分层注意力的信息级联预测模型[J]. 计算机科学, 2020, 47(6): 201-209.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[2] 锁延锋,王少杰,秦宇,李秋香,丰大军,李京春. 工业控制系统的安全技术与应用研究综述[J]. 计算机科学, 2018, 45(4): 25 -33 .
[3] 刘景玮, 刘京菊, 陆余良, 杨斌, 朱凯龙. 基于网络攻防博弈模型的最优防御策略选取方法[J]. 计算机科学, 2018, 45(6): 117 -123 .
[4] 赖文星, 邓忠民. 基于支配强度的NSGA2改进算法[J]. 计算机科学, 2018, 45(6): 187 -192 .
[5] 张小华, 黄波. 基于Bayes-MeTiS网格划分的3D几何重构[J]. 计算机科学, 2018, 45(6): 265 -269 .
[6] 钟锐, 吴怀宇, 何云. 基于局部融合特征与分层增量树的快速人脸识别算法[J]. 计算机科学, 2018, 45(6): 308 -313 .
[7] 刘丹,马秀荣,单云龙. 一种基于ST-RFT算法的数字调制信号识别方法[J]. 计算机科学, 2018, 45(5): 64 -68 .
[8] 陈志雄,王时绘,高榕. 基于情感倾向性分析的微博意见领袖识别模型[J]. 计算机科学, 2018, 45(5): 168 -175 .
[9] 艾拓,梁亚玲,杜明辉. 基于难负样本挖掘的改进Faster RCNN训练方法[J]. 计算机科学, 2018, 45(5): 250 -254 .
[10] 崔倩男,田小平,吴成茂. 基于引导滤波改进的暗原色去雾算法[J]. 计算机科学, 2018, 45(5): 285 -290 .