Computer Science ›› 2025, Vol. 52 ›› Issue (9): 276-281.doi: 10.11896/jsjkx.241200204
• Computer Graphics & Multimedia • Previous Articles Next Articles
PENG Jiao1, HE Yue1, SHANG Xiaoran2, HU Saier2, ZHANG Bo1, CHANG Yongjuan1, OU Zhonghong3, LU Yanyan1, JIANG dan1, LIU Yaduo1
CLC Number:
[1]LI U C,SONG Y,CAO L L,et al.TGIF:A New Dataset and Benchmark on Animated GIF Description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:4641-4650. [2]SHMUELI B,RAY S,KU L W.Happy dance,slow clap:Using reaction GIFs to predict induced affect on Twitter[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.Stroudsburg,PA:ACL,2021:395-401. [3]CHEN H,DING G,LIU X,et al.IMRAM:Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2020:12655-12663. [4]ZHANG Q,LEI Z,ZHANG Z,et al.Context-Aware Attention Network for Image-Text Retrieval[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2020:3536-3545. [5]ZHENG F,LI W,WANG X,et al.A Cross-Attention Mechanism Based on Regional-Level Semantic Features of Images for Cross-Modal Text-Image Retrieval in Remote Sensing[J].Applied Sciences,2022,12(23):12221. [6]SONG Y,SOLEYMANI M.Polysemous visual-semantic embedding for cross-modal retrieval[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2019:1979-1988. [7]WANG X,JURGENS D.An animated picture says at least a thousand words:selecting gif-based replies in multimodal dialog[C]//Findings of the Association for Computational Linguistics:EMNLP 2021.Stroudsburg,PA:ACL,2021:3228-3257. [8]LI G,DUAN N,FANG Y,et al.Unicoder-vl:A universal en-coder for vision and language by cross-modal pre-training[C]//Proceedings of the AAAI Conference on Artificial Intelligence.New York:AAAI,2020:11336-11344. [9]CONNEAU A,LAMPLE G.Cross-lingual Language Model Pretraining[C]//NeurIPS:Advances in Neural Information Processing Systems.Curran Associates Inc.,2019. [10]HUANG H,LIANG Y,DUAN N,et al.Unicoder:A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks[J].arXiv:1909.00964,2019. [11]ZHANG K,MAO Z,WANG Q,et al.Negative-Aware Attention Framework for Image-Text Matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2022:15661-15670. [12]LI X,YIN X,LI C,et al.Oscar:Object-Semantics Aligned Pre-training for Vision-Language Tasks[C]//Proceedings of 16th European Conference on Computer Vision(ECCV 2020).Sprin-ger,2020:121-137. [13]CHEN S,ZHAO Y,JIN Q,et al.Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2020:10638-10647. [14]SONG X,CHEN J,WU Z,et al.Spatial-Temporal Graphs forCross-Modal Text2Video Retrieval[J].IEEE Transactions on Multimedia,2022,24:2914-2923. [15]MIECH A,ZHUKOV D,ALAYRAC J B,et al.HowTo100M:Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.IEEE,2019:2630-2640. [16]LUO J,LI Y,PAN Y,et al.CoCo-BERT:Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising[C]//Proceedings of the 29th ACM International Conference on Multimedia.New York:ACM,2021:5600-5608. [17]PENG J,HUANG J,XIONG P,et al.Video-Text As GamePlayers:Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2023:2472-2482. [18]DONG J F,ZHANG M,ZHANG Z,et al.Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video Retrieval[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.IEEE,2023:11302-11312. |
[1] | TANG Lijun , YANG Zheng, ZHAO Nan, ZHAI Suwei. FLIP-based Joint Similarity Preserving Hashing for Cross-modal Retrieval [J]. Computer Science, 2025, 52(6A): 240400151-10. |
[2] | SUN Yang, DING Jianwei, ZHANG Qi, WEI Huiwen, TIAN Bowen. Study on Super-resolution Image Reconstruction Using Residual Feature Aggregation NetworkBased on Attention Mechanism [J]. Computer Science, 2024, 51(6A): 230600039-6. |
[3] | GAO Nan, ZHANG Lei, LIANG Ronghua, CHEN Peng, FU Zheng. Scene Text Detection Algorithm Based on Feature Enhancement [J]. Computer Science, 2024, 51(6): 256-263. |
[4] | CAO Qingyuan, ZHU Jianhong. Study on Identification of Concrete Sand and Gravel Aggregate Types Based on Improved Residual Network [J]. Computer Science, 2024, 51(11A): 231000082-6. |
[5] | LUO Huilan, LONG Jun, LIANG Miaomiao. Attentional Feature Fusion Approach for Siamese Network Based Object Tracking [J]. Computer Science, 2023, 50(6A): 220300237-9. |
[6] | ZHANG Changfan, MA Yuanyuan, LIU Jianhua, HE Jing. Dual Gating-Residual Feature Fusion for Image-Text Cross-modal Retrieval [J]. Computer Science, 2023, 50(6A): 220700030-7. |
[7] | YANG Xiaoyu, LI Chao, CHEN Shunyao, LI Haoliang, YIN Guangqiang. Text-Image Cross-modal Retrieval Based on Transformer [J]. Computer Science, 2023, 50(4): 141-148. |
[8] | ZHANG Longji, ZHAO Hui. Aspect-level Sentiment Analysis Integrating Syntactic Distance and Aspect-attention [J]. Computer Science, 2023, 50(12): 262-269. |
[9] | WANG Zhendong, DONG Kaikun, HUANG Junheng, WANG Bailing. SemFA:Extreme Multi-label Text Classification Model Based on Semantic Features and Association Attention [J]. Computer Science, 2023, 50(12): 270-278. |
[10] | GU Baocheng, LIU Li. Cross-modal Hash Retrieval Based on Text-guided Image Semantic Fusion [J]. Computer Science, 2023, 50(11A): 221100191-6. |
[11] | WANG Lin, LIU Zhe, SHI Dianxi, ZHOU Chenlei, YANG Shaowu, ZHANG Yongjun. Fusion Tracker:Single-object Tracking Framework Fusing Image Features and Event Features [J]. Computer Science, 2023, 50(10): 96-103. |
[12] | SUN Jie-qi, LI Ya-feng, ZHANG Wen-bo, LIU Peng-hui. Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation [J]. Computer Science, 2022, 49(6A): 434-440. |
[13] | HAN Hui-zhen, LIU Li-bo. Lycium Barbarum Pest Retrieval Based on Attention and Visual Semantic Reasoning [J]. Computer Science, 2022, 49(11A): 211200087-6. |
[14] | LIU Li-bo, GOU Ting-ting. Cross-modal Retrieval Combining Deep Canonical Correlation Analysis and Adversarial Learning [J]. Computer Science, 2021, 48(9): 200-207. |
[15] | FENG Xia, HU Zhi-yi, LIU Cai-hua. Survey of Research Progress on Cross-modal Retrieval [J]. Computer Science, 2021, 48(8): 13-23. |
|