计算机科学 ›› 2021, Vol. 48 ›› Issue (1): 190-196.doi: 10.11896/jsjkx.200600076
赵佳琦1,2,3, 王瀚正1,2, 周勇1,2, 张迪1,2, 周子渊1,2
ZHAO Jia-qi1,2,3, WANG Han-zheng1,2, ZHOU Yong1,2, ZHANG Di1,2, ZHOU Zi-yuan1,2
摘要: 遥感图像描述生成是同时涉及计算机视觉和自然语言处理领域的热门研究话题,其主要工作是对于给定的图像自动地生成一个对该图像的描述语句。文中提出了一种基于多尺度与注意力特征增强的遥感图像描述生成方法,该方法通过软注意力机制实现生成单词与图像特征之间的对齐关系。此外,针对遥感图像分辨率较高、目标尺度变化较大的特点,还提出了一种基于金字塔池化和通道注意力机制的特征提取网络(Pyramid Pool and Channel Attention Network,PCAN),用于捕获遥感图像多尺度以及局部跨通道交互信息。将该模型提取到的图像特征作为描述生成阶段软注意力机制的输入,通过计算得到上下文信息,然后将该上下文信息输入至LSTM网络中,得到最终的输出序列。在RSICD与MSCOCO数据集上对PCAN及软注意力机制进行有效性实验,结果表明,PCAN及软注意力机制的加入能够提升生成语句的质量,实现单词与图像特征之间的对齐。通过对软注意力机制的可视化分析,提高了模型结果的可信度。此外,在语义分割数据集上进行实验,结果表明所提PCAN对于语义分割任务同样具有有效性。
中图分类号:
[1] LIU J Q,LI Z,ZHANG X Y.Review of Maritime Target Detection in Visible Bands of Optical Remote Sensing Images[J].Computer Science,2020,47(3):116-123. [2] YIN Y,HUANG H,ZHANG Z X.Research on Ship Detection Technology Based on Optical Remote Sensing Image[J].Computer Science,2019,46(3):82-87. [3] CUI L,ZHANG P,CHE J.Overview of Deep Neural Network Based Classification Algorithms for Remote Sensing Images[J].Computer Science,2018,45(S1):50-53. [4] XU K,BA J,KIROS R,et al.Show,attend and tell:Neural image caption generation with visual attention[C]//International Conference on Machine Learning.2015:2048-2057. [5] MAO J,XU W,YANG Y,et al.Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)[J].arXiv:1412.6632,2014. [6] LI J W,MONROE W,JURAFSKY D.Understanding neuralnetworks through representation erasure[J].arXiv:1612.08220,2016. [7] JI S L,LI J F,DU T Y,et al.A Survey on Techniques,Applications and Security of Machine Learning Interpretability[J].Journal of Computer Research and Development,2019,56(10):2071-2096. [8] KULKARNI G,PREMRAJ V,ORDONEZ V,et al.BabyTalk:Understanding and Generating Simple Image Descriptions[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(12):2891-2903. [9] SUN C,GAN C,NEVATIA R,et al.Automatic Concept Dis-covery from Parallel Text and Visual Corpora[C]// International Conference on Computer Vision.2015:2596-2604. [10] LU J S,XIONG C M,PARIKH D,et al.Knowing When to Look:Adaptive Attention via a Visual Sentinel for Image Captioning[C]// 2017 IEEE Conference on Computer vision and pattern recognition.2017:3242-3250. [11] ANDERSON P,HE X,BUEHLER C,et al.Bottom-Up andTop-Down Attention for Image Captioning and Visual Question Answering[C]// Computer Vision and Pattern Recognition (CVPR).IEEE.2018:6077-6086. [12] DAS A,KOTTUR S,GUPTA K,et al.Visual Dialog[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2017. [13] CHAUDHARI S,POLATKAN G,RAMANATH R,et al.An Attentive Survey of Attention Models [J].arXiv:1904.02874,2019. [14] ROSENBLATT F.The Perceptron:A Probabilistic Model for Information Storage and Organization in the Brain[J].Psychological Review,1958,65(6):386-408. [15] MNIH V,HEESS N,GRAVES A,et al.Recurrent Models of Visual Attention[C]//Neural Information Processing Systems.2014:2204-2212. [16] CHEN L,YANG Y,WANG J,et al.Attention to Scale:Scale-Aware Semantic Image Segmentation[C]// Computer Vision and Pattern Recognition.2016:3640-3649. [17] BAHDANAU D,CHO K,BENGIO Y,et al.Neural Machine Translation by Jointly Learning to Align and Translate[J]. arXiv:1409.0473,2014. [18] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need [C]//Proc of the 31st Int Conf on Neural Information Processing Systems.USA:Curran Associates Inc.,2017:6000-6010. [19] YANG Z,YANG D,DYER C,et al.Hierarchical Attention Networks for Document Classification[C]//North American Chapter of the Association for Computational Linguistics.2016:1480-1489. [20] ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890. [21] ZAREMBA W,SUTSKEVER I,VINYALS O.Recurrent Neural Network Regularization[J].arXiv:1409.2329,2014. [22] CHENG G,HAN J,LU X.Remote sensing image scene classification:Benchmark and state of the art[J].Proceedings of the IEEE,2017,105(10):1865-1883. [23] KISHORE P,SALIM R,TODD W,et al.BLEU:a Method for Automatic Evaluation of Machine Translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.2002:311-318. [24] DENKOWSKI M,LAVIE A.Meteor Universal:Language Specific Translation Evaluation for Any Target Language[C]// Proceedings of the Ninth Workshop on Statistical Machine Translation.2014:376-380. [25] LIN C,HOVY E.Automatic evaluation of summaries using N-gram co-occurrence statistics[C]// North American Chapter of the Association for Computational Linguistics.2003:71-78. [26] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440. [27] CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv:1706.05587,2017. [28] CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:801-818. [29] YU C,WANG J,PENG C,et al.Bisenet:Bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:325-341. [30] YUAN Y,WANG J.Ocnet:Object context network for scene parsing[J].arXiv:1809.00916,2018. [31] HUANG Z,WANG X,HUANG L,et al.Ccnet:Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:603-612. [32] ZHAO H,ZHANG Y,LIU S,et al.Psanet:Point-wise spatial attention network for scene parsing[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:267-283. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[3] | 戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032 |
[4] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[5] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[6] | 王馨彤, 王璇, 孙知信. 基于多尺度记忆残差网络的网络流量异常检测模型 Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network 计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011 |
[7] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[8] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[9] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[10] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[11] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 |
[12] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[13] | 熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚. 融合双向门控循环单元和注意力机制的软件自承认技术债识别方法 Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism 计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075 |
[14] | 彭双, 伍江江, 陈浩, 杜春, 李军. 基于注意力神经网络的对地观测卫星星上自主任务规划方法 Satellite Onboard Observation Task Planning Based on Attention Neural Network 计算机科学, 2022, 49(7): 242-247. https://doi.org/10.11896/jsjkx.210500093 |
[15] | 赵冬梅, 吴亚星, 张红斌. 基于IPSO-BiLSTM的网络安全态势预测 Network Security Situation Prediction Based on IPSO-BiLSTM 计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103 |
|