Computer Science ›› 2022, Vol. 49 ›› Issue (11): 134-140.doi: 10.11896/jsjkx.220600010
• Computer Graphics & Multimedia • Previous Articles Next Articles
MIAO Lan-xin1, LEI Yu1, ZENG Peng-peng1, LI Xiao-yu2, SONG Jing-kuan1
CLC Number:
[1]ZENG P,GAO L,LYU X,et al.Conceptual and syntacticalcross-modal alignment with cross-level consistency for image-text matching [C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:2205-2213. [2]FENG X,HU Z Y,LIU C H.Survey of Research Progress on Cross modal Retrieval [J].Computer Science,2021,48(8):13-23. [3]FENG Y G,CAI G Y.Cross-modal Retrieval Fusing Multilayer Semantics[J].Computer Science,2019,46(3):227-233. [4]WANG W,CHEN Z,HU H.Hierarchical attention network for image captioning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8957-8964. [5]YAN Y,ZHUANG N,NI B,et al.Fine-grained Video Captio-ning via Graph-based Multi-granularity Interaction Learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,44(2):666-683. [6]LEI Y,HE Z,ZENG P,et al.Hierarchical Representation Net-work With Auxiliary Tasks For Video Captioning[C]//2021 IEEE International Conference on Multimedia and Expo(IC-ME).IEEE,2021. [7]SEO A,KANG G C,PARK J,et al.Attend What You Need:Motion-Appearance Synergistic Networks for Video Question Answering[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Vo-lume 1:Long Papers).2021:6167-6177. [8]WANG H,GUO D,HUA X S,et al.Pairwise VLAD Interaction Network for Video Question Answering[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:5119-5127. [9]GAO L,ZENG P,SONG J,et al.Structured two-stream attention network for video question answering[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:6391-6398. [10]GAO L,LEI Y,ZENG P,et al.Hierarchical Representation Net-work With Auxiliary Tasks for Video Captioning and Video Question Answering[J].IEEE Transactions on Image Proces-sing,2022,31:202-215. [11]KARPATHY A,JOULIN A,LI F.Deep fragment embeddings for bidirectional image sentence mapping[J].Advances in Neural Information Processing Systems,2014,2:1889-1897. [12]FAGHRI F,FLEET D J,KIROS J R,et al.Vse++:Improving visual-semantic embeddings with hard negatives[C]//Procee-dings of the British Machine Vision Conference(BMVC).2018. [13]DENG Y J,ZHANG F L,CHEN X Q,et al.Collaborative Attention Network Model for Cross-modal Retrieval[J].Computer Science,2020,47(4):54-59. [14]LEE K H,CHEN X,HUA G,et al.Stacked cross attention for image-text matching[C]//Proceedings of the European Confe-rence on Computer Vision(ECCV).2018:201-216. [15]WU Y,WANG S,SONG G,et al.Learning fragment self-attention embeddings for image-text matching[C]//Proceedings of the 27th ACM International Conference on Multimedia.2019:2088-2096. [16]LI K,ZHANG Y,LI K,et al.Visual semantic reasoning forimage-text matching[C]//Proceedings of the IEEE/CVF International conference on computer vision.2019:4654-4662. [17]QU L,LIU M,CAO D,et al.Context-aware multi-view summarization network for image-text matching[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:1047-1055. [18]CHEN H,DING G,LIU X,et al.Imram:Iterative matchingwith recurrent attention memory for cross-modal image-text retrieval[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:12655-12663. [19]GE X,CHEN F,JOSE J M,et al.Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:5185-5193. [20]WANG X,ZHU L,YANG Y.T2vlad:global-local sequencealignmentfor text-video retrieval[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:5079-5088. [21]WANG Y,YANG H,QIAN X,et al.Position focused attention network for image-text matching[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.2019:3792-3798. [22]WANG H,ZHANG Y,JI Z,et al.Consensus-aware visual-se-mantic embedding for image-text matching[C]//European Conference on Computer Vision.Cham:Springer,2020:18-34. [23]ARANDJELOVIC R,GRONAT P,TORII A,et al.NetVLAD:CNN architecture for weakly supervised place recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:5297-5307. [24]UY M A,LEE G H.Pointnetvlad:Deep point cloud basedretrieval for large-scale place recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4470-4479. [25]HAUSLER S,GARG S,XU M,et al.Patch-netvlad:Multi-scale fusion of locally-global descriptors for place recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:14141-14152. [26]ANDERSON P,HE X,BUEHLER C,et al.Bottom-up and top-down attention for image captioning and visual question answe-ring[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6077-6086. [27]JIANG H,MISRA I,ROHRBACH M,et al.In defense of grid features for visual question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10267-10276. [28]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2014:1532-1543. [29]YU F,KOLTUN V.Multi-scale context aggregation by dilated convolutions[J].arXiv:1511.07122,2015. [30]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755. [31]PLUMMER B A,WANG L,CERVANTES C M,et al.Flickr30k entities:Collecting region-to-phrase correspondences for richer image-to-sentence models[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2641-2649. [32]WANG Z,LIU X,LI H,et al.Camp:Cross-modal adaptive message passing for text-image retrieval[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:5764-5773. [33]ZHANG Q,LEI Z,ZHANG Z,et al.Context-aware attention network for image-text retrieval[C]//Proceedings of theIEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3536-3545. [34]ZHONG X,YANG Z,YE M,et al.Auxiliary bi-level graph representation for cross-modal image-text retrieval[C]//2021 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2021:1-6. |
[1] | ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112. |
[2] | CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126. |
[3] | ZHANG Yuan, KANG Le, GONG Zhao-hui, ZHANG Zhi-hong. Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM [J]. Computer Science, 2022, 49(7): 31-39. |
[4] | LIU Wei-ye, LU Hui-min, LI Yu-peng, MA Ning. Survey on Finger Vein Recognition Research [J]. Computer Science, 2022, 49(6A): 1-11. |
[5] | GAO Yuan-hao, LUO Xiao-qing, ZHANG Zhan-cheng. Infrared and Visible Image Fusion Based on Feature Separation [J]. Computer Science, 2022, 49(5): 58-63. |
[6] | ZUO Jie-ge, LIU Xiao-ming, CAI Bing. Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion [J]. Computer Science, 2022, 49(3): 197-203. |
[7] | REN Shou-peng, LI Jin, WANG Jing-ru, YUE Kun. Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction [J]. Computer Science, 2022, 49(2): 265-271. |
[8] | HE Yu-lin, LI Xu, JIN Yi, HUANG Zhe-xue. Handwritten Character Recognition Based on Decomposition Extreme Learning Machine [J]. Computer Science, 2022, 49(11): 148-155. |
[9] | ZHANG Min, YU Zeng, HAN Yun-xing, LI Tian-rui. Overview of Person Re-identification for Complex Scenes [J]. Computer Science, 2022, 49(10): 138-150. |
[10] | ZHANG Shi-peng, LI Yong-zhong. Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions [J]. Computer Science, 2021, 48(9): 345-351. |
[11] | FENG Xia, HU Zhi-yi, LIU Cai-hua. Survey of Research Progress on Cross-modal Retrieval [J]. Computer Science, 2021, 48(8): 13-23. |
[12] | ZHANG Li-qian, LI Meng-hang, GAO Shan-shan, ZHANG Cai-ming. Summary of Computer-assisted Tongue Diagnosis Solutions for Key Problems [J]. Computer Science, 2021, 48(7): 256-269. |
[13] | BAO Yu-xuan, LU Tian-liang, DU Yan-hui, SHI Da. Deepfake Videos Detection Method Based on i_ResNet34 Model and Data Augmentation [J]. Computer Science, 2021, 48(7): 77-85. |
[14] | LI Na-na, WANG Yong, ZHOU Lin, ZOU Chun-ming, TIAN Ying-jie, GUO Nai-wang. DDoS Attack Random Forest Detection Method Based on Secondary Screening of Feature Importance [J]. Computer Science, 2021, 48(6A): 464-467. |
[15] | CHEN Yang, WANG Jin-liang, XIA Wei, YANG Hao, ZHU Run, XI Xue-feng. Footprint Image Clustering Method Based on Automatic Feature Extraction [J]. Computer Science, 2021, 48(6A): 255-259. |
|