Computer Science ›› 2021, Vol. 48 ›› Issue (4): 157-163.doi: 10.11896/jsjkx.200300146

Generation of Image Caption of Joint Self-attention and Recurrent Neural Network

WANG Xi1, ZHANG Kai1, LI Jun-hui1, KONG Fang1, ZHANG Yi-tian2   

  1. 1 School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China
    2 China Industrial Control Systems Cyber Emergency Response Team,Beijing 100000,China
  • Received:2020-06-24 Revised:2020-07-12 Online:2021-04-15 Published:2021-04-09
  • About author:WANG Xi,born in 1995,postgraduate,is a member of China Computer Federation.Her main research interests include natural language processing and image caption.(
    LI Jun-hui,born in 1983,associate professor.His main research interests include natural language processing and machine translation.
  • Supported by:
    National Natural Science Foundation of China(61876120).

Abstract: At present,most image caption generation models consist of an image encoder based on convolutional neural network(CNN) and a caption decoder based on recurrent neural network(RNN).The image encoder is used to extract visual features from images,while the caption decoder generates captions based on visual features with an attention mechanism.Although the decoder uses RNN with an attention mechanism to model the interaction between image features and captions,it ignores the self-attention of the internal interaction of images or captions.Therefore,this paper proposes a novel model that combines the advantages of RNN and self-attention network for image caption generation.On the one hand,this model can capture interactions within and between modalities in the unified attention area through the self -attention simultaneously.On the other hand,it maintains the inherent advantages of RNN.Experimental results on the MSCOCO dataset show that the proposed model outperforms baseline by improving the performance from 1.135 to 1.166 in CIDEr.

Key words: Image caption, Recurrent neural network, Self-attention mechanism

CLC Number: 

  • TP391.1
