%A ZHAO Jia-qi, WANG Han-zheng, ZHOU Yong, ZHANG Di, ZHOU Zi-yuan %T Remote Sensing Image Description Generation Method Based on Attention and Multi-scale Feature Enhancement %0 Journal Article %D 2021 %J Computer Science %R 10.11896/jsjkx.200600076 %P 190-196 %V 48 %N 1 %U {https://www.jsjkx.com/CN/abstract/article_19687.shtml} %8 2021-01-15 %X Remote sensing image description generation is a hot research topic involving both computer vision and natural language processing.Its main work is to automatically generate a description sentence for a given image.This paper proposes a remote sensing image description generation method based on multi-scale and attention feature enhancement.The alignment relationship between generated words and image features is realized through soft attention mechanism,which improves the pre-interpretability of the model.In addition,in view of the high resolution of remote sensing images and large changes in target scale,this paper proposes a feature extraction network (Pyramid Pool and Channel Attention Network,PCAN) based on pyramid pooling and channel attention mechanism to capture ofmulti-scale remote sensing image and local cross-channel mutual information.Image features extracted by the model are used as the input to describe the soft attention mechanism of the generation stage,thereby calculating the context information,and then inputting the context information into the LSTM network to obtain the final output sequence.Effectiveness experiments of PCAN and soft attention mechanism on RSICD and MSCOCO datasets prove that the joi-ning of PCAN and soft attention mechanism can improve the quality of generated sentences and realize the alignment between words and image features.Through the visualization analysis of the soft attention mechanism,the credibility of the model results is improved.In addition,experiments on the semantic segmentation dataset prove that the proposed PCAN is also effective for semantic segmentation tasks.