Computer Science ›› 2020, Vol. 47 ›› Issue (7): 125-129.doi: 10.11896/jsjkx.190700006
• Computer Graphics & Multimedia • Previous Articles Next Articles
ZHANG Heng1, MA Ming-dong2, WANG De-yu2
CLC Number:
[1]TAPASWI M,ZHU Y,STIEFELHAGEN R,et al.Movieqa:Understanding stories in movies through question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016. [2]YU Y,KO H,CHOI J,et al.End-to-end concept word detection for video captioning,retrieval,and question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017. [3]PAN Y,MEI T,YAO T,et al.Jointly modeling embeddingand translation to bridge video and language[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016. [4]PLUMMER B A,BROWN M,LAZEBNIK S.Enhancing video summarization via vision-language embedding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017. [5]XU R,XIONG C,CHEN W,et al.Jointly modeling deepvideoand compositional text to bridge vision and language in a unifiedframework[C]//Proceeding of the Association for the Advance of Artificial Intelligence.2015. [6]YU H,WANG J,HUANG Z,et al.Video paragraphcaptioning using hierarchical recurrent neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016. [7]TRAN D,BOURDEV L,FERGUS R,et al.Learningspatiotemporal features with 3d convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2015. [8]CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017. [9]XU J,MEI T,YAO T,et al.Msr-vtt:A large video descriptiondataset for bridging video and language[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016. [10]ROHRBACH A,ROHRBACH M,TANDON N,et al.A dataset formovie description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015. [11]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2009. [12]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//Proceedings of the European Conference on Computer Vision.2014. [13]KRISHNA R,ZHU Y,GROTH O,et al.Visual genome:Connecting language and vision usingcrowdsourced dense image annotations[C]//Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition.2016. [14]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation ofword representations in vector space[C]//Proceedings of the Conference of the Computer and Language.2013. [15]ARANDJELOVIC R,GRONAT P,TORII A,et al.NetVLAD:CNN architecture for weakly supervised place recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016. [16]HE K,ZHANG X,REN S,et al.Deep Residual Learning for ImageRecognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016. [17]CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a newmodel and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017. [18]HERSHEY S,CHAUDHURI S,ELLIS D P W,et al.CNN architectures for large-scale audioclassification[C]//Proceedings of the International Conference on Acoustics,Speech and Signal Processing (ICASSP).2017. [19]WANG L,LI Y,LAZEBNIK S.Learning deep structure-pre-servingimage-text embeddings[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016. [20]WANG L,LI Y,HUANG J,et al.Learning two-branch neuralnetworks for image-text matching tasks[C]//Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence.2018. [21]KARPATHY A,JOULIN A,LI F F.Deep fragment embed-dings forbidirectional image sentence mapping[C]//Proceedings of the Conference and Workshop on Neural Information Processing Systems.2014. [22]YU Y,KO H,CHOI J,et al.Video captioning and retrievalmodels with semantic attention[C]//Proceedings of the European Conference on Computer Vision.2016. [23]KINGMA D P,BA J.Adam:A method for stochastic optimization[C]//Proceedings of the International Conference on Lear-ning Representations.2015. [24]TORABI A,TANDON N,SIGAL L.Learning language-visual embedding for movie understanding with natural-language[C]//Proceedings of the IEEE International Conference on Computer Vision.2016. [25]MIECH A,ALAYRAC J B,BOJANOWSKI P,et al.Learning from Video and Text via Large-Scale Discriminative Clustering[C]//Proceedings of the IEEE International Conference on Computer Vision.2017. [26]KLEIN B,LEV G,SADEH G,et al.Associating neural wordembeddings with deep image representations using fisher vectors[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015. [27]YU Y,KIM J,KIM G.Joint sequence fusion model for video question-answering and retrieval[C]//Proceedings of the IEEE International Conference on Computer Vision.2017. [28]MIECH A,LAPTEV I,SIVIC J.Learning a Text-Video Embedding from Incomplete and Heterogeneous Data[C]//Procee-dings of the IEEE Computer Vision and Pattern Recognition.2019. |
[1] | ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161. |
[2] | ZHANG Hong-bo, DONG Li-jia, PAN Yu-biao, HSIAO Tsung-chih, ZHANG Hui-zhen, DU Ji-xiang. Survey on Action Quality Assessment Methods in Video Understanding [J]. Computer Science, 2022, 49(7): 79-88. |
[3] | GUO Dan, TANG Shen-geng, HONG Ri-chang, WANG Meng. Review of Sign Language Recognition, Translation and Generation [J]. Computer Science, 2021, 48(3): 60-70. |
[4] | WU A-ming, JIANG Pin, HAN Ya-hong. Survey of Cross-media Question Answering and Reasoning Based on Vision and Language [J]. Computer Science, 2021, 48(3): 71-78. |
[5] | WANG Shu-hui, YAN Xu, HUANG Qing-ming. Overview of Research on Cross-media Analysis and Reasoning Technology [J]. Computer Science, 2021, 48(3): 79-86. |
[6] | FAN Lian-xi, LIU Yan-bei, WANG Wen, GENG Lei, WU Jun, ZHANG Fang, XIAO Zhi-tao. Multimodal Representation Learning for Alzheimer's Disease Diagnosis [J]. Computer Science, 2021, 48(10): 107-113. |
[7] | YANG Ming-hao,TAO Jian-hua,LI Hao and CHAO Lin-lin. Nature Multimodal Human-Computer-Interaction Dialog System [J]. Computer Science, 2014, 41(10): 12-18. |
|