Computer Science ›› 2023, Vol. 50 ›› Issue (4): 141-148.doi: 10.11896/jsjkx.220100083
• Computer Graphics & Multimedia • Previous Articles Next Articles
YANG Xiaoyu, LI Chao, CHEN Shunyao, LI Haoliang, YIN Guangqiang
CLC Number:
[1]HAO Y,DONG L,WEI F,et al.Visualizing and Understanding the Effectiveness of BERT[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.Hong Kong:Association for Computational Linguistics,2019:4141-4150. [2]TENNEY I,DAS D,PAVLICK E.BERT Rediscovers the Classical NLP Pipeline[C]//Proceedings of the 57th Annual Mee-ting of the Association for Computational Linguistics.Florence:Association for Computational Linguistics,2019:4593-4601. [3]GABEUR V,SUN C,ALAHARI K,et al.Multi-modal trans-former for video retrieval[C]//Proceedings of the 16th Euro-pean Conference Computer Vision(ECCV).Glasgow:Springer,2020:214-229. [4]PATRICK M,HUANG P,ASANO Y,et al.Support-set bottlenecks for video-text representation learning[C]//Proceedings of the 9th International Conference on Learning Representations(ICLR).Austria:OpenReview,2021:1-18. [5]LI K,ZHANG Y,LI K,et al.Visual Semantic Reasoning forImage-Text Matching[C]//2019 IEEE International Conference on Computer Vision(ICCV).2019:4653-4661. [6]EISENSCHTAT A,WOLF L.Linking Image and Text with 2-Way Nets[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:1855-1865. [7]FAGHRI F,FLEET D J,KIROS J R,et al.VSE++:Improving Visual-Semantic Embeddings with Hard Negatives[C]//British Machine Vision Conference(BMVC).2018:12-21. [8]GU J,CAI J,JOTY S,et al.Look,Imagine and Match:Improving Textual-Visual Cross-Modal Retrieval with Generative Models[C]//2018 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2018:7181-7189. [9]HUANG Y,WANG W,WANG L.Instance-aware Image andSentence Matching with Selective Multimodal LSTM[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:7251-7262. [10]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149. [11]CHEN H,DING G,LIU X,et al.IMRAM:Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Seattle:IEEE,2020:12652-12660. [12]WANG Y X,YANG H,QIAN X M,et al.Position Focused Attention Network for Image-Text Matching[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence(IJCAI).Macao:AAAI,2019:3792-3798. [13]JI Z,WANG H,HAN J,et al.SMAN:Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval[J].IEEE Transactions on Cybernetics,2020(99):1-12. [14]XU X,WANG T,YANG Y,et al.Cross-Modal Attention with Semantic Consistence for Image-Text Matching[J].IEEE Transactions on Neural Networks and Learning Systems,2020(99):1-14. [15]ASHISH V,NOAM S,NIKI P,et al.Attention is all you need[J].Advances in Neural Information Processing Systems,2017(1):5998-6008. [16]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics.Minneapolis:Association for Computational Linguistics,2019:4171-4186. [17]QU L,LIU M,CAO D,et al.Context-Aware Multi-View Summarization Network for Image-Text Matching[C]//Proceedings of the 28th ACM International Conference on Multimedia.Seattle:ACM,2020:1047-1055. [18]WEI X,ZHANG T,LI Y,et al.Multi-Modality Cross Attention Network for Image and Sentence Matching[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Seattle:IEEE,2020:10938-10947. [19]LU J,BATRA D,PARIKH D,et al.ViLBERT:PretrainingTask-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks[C]//Proceedings of International Conference on Neural Information Processing Systems.Vabcouver:IEEE,2019:13-23. [20]SU W,ZHU X,CAO Y,et al.VL-BERT:Pre-training of Gene-ric Visual-Linguistic Representations[C]//International Confe-rence on Learning Representations(ICLR).2020. [21]PARMAR N,VASWANI A,USZKOREIT J,et al.ImageTransformer[J].International Conference on Machine Lear-ning,2018(80):4052-4061. [22]CORDONNIER J,LOUKAS A,JAGGI M.On the Relationship between Self-Attention and Convolutional Layers[C]//International Conference on Learning Representations(ICLR).2020. [23]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.AnImage is Worth 16x16 Words:Transformers for Image Recognition at Scale[C]//International Conference on Learning Representations(ICLR).2021. [24]MESSINA N,FALCHI F,ESULI A,et al.Transformer Reaso-ning Network for Image-Text Matching and Retrieval[C]//International Conference on Learning Representations(ICLR).2020:5222-5229. [25]MESSINA N,AMATO G,ESULI A,et al.Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transfor-mer Encoders[C]//CoRR.2020. [26]WEI X,ZHANG T,LI Y,et al.Multi-Modality Cross Attention Network for Image and Sentence Matching[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Seattle:IEEE,2020:10938-10947. [27]LI G,DUAN N,FANG Y,et al.Unicoder-VL:A Universal Encoder for Vision and Language by Cross-Modal Pre-Training[J].AAAI Conference on Artificial Intelligence.2020:11336-11344. [28]CAO L,QIAN S,ZHANG H,et al.Global Relation-Aware Attention Network for Image-Text Retrieval[C]//Proceedings of International Conference on Multimedia Retrieval.Taiwan:ACM,2021:19-28. [29]PETERS M,NEUMANN M,ZETTLEMOYER L,et al.Dissecting Contextual Word Embeddings:Architecture and Representation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Brussels:Association for Computational Linguistics,2018:1499-1509. [30]VIG J.A Multiscale Visualization of Attention in the Trans-former Model[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics:System Demonstrations.Florence:Association for Computational Linguistics,2019:37-42. [31]CHEN T,KORNBLITH S,NOROUZI M,et al.A SimpleFramework for Contrastive Learning of Visual Representations[C]//International Conference on Machine Learning(ICML).2020:1597-1607. [32]LIN T,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[J].European Conference Computer Vision(ECCV),2014,8693:740-755. [33]YOUNG P,LAI A,HODOSH M,et al.From image descriptions to visual denotations:New similarity metrics for semantic infe-rence over event descriptions[J].Transactions of the Association for Computational Linguistics,2014,2:67-78. [34]LEE K H,XI C,GANG H,et al.Stacked Cross Attention for Image-Text Matching[C]//15th European Conference Compu-ter Vision(ECCV).2018:212-228. [35]WANG Z,LIU X,LI H,et al.CAMP:Cross-Modal Adaptive Message Passing for Text-Image Retrieval[C]//2019 IEEE International Conference on Computer Vision(ICCV).2019:5763-5772. |
[1] | WANG Zhenbiao, QIN Yali, WANG Rongfang, ZHENG Huan. Image Compressed Sensing Attention Neural Network Based on Residual Feature Aggregation [J]. Computer Science, 2023, 50(4): 117-124. |
[2] | LIANG Weiliang, LI Yue, WANG Pengfei. Lightweight Face Generation Method Based on TransEditor and Its Application Specification [J]. Computer Science, 2023, 50(2): 221-230. |
[3] | CAO Jinjuan, QIAN Zhong, LI Peifeng. End-to-End Event Factuality Identification with Joint Model [J]. Computer Science, 2023, 50(2): 292-299. |
[4] | CAI Xiao, CEHN Zhihua, SHENG Bin. SPT:Swin Pyramid Transformer for Object Detection of Remote Sensing [J]. Computer Science, 2023, 50(1): 105-113. |
[5] | ZHANG Jingyuan, WANG Hongxia, HE Peisong. Multitask Transformer-based Network for Image Splicing Manipulation Detection [J]. Computer Science, 2023, 50(1): 114-122. |
[6] | WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48. |
[7] | ZHANG Jia-hao, LIU Feng, QI Jia-yin. Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer [J]. Computer Science, 2022, 49(6A): 370-377. |
[8] | KANG Yan, XU Yu-long, KOU Yong-qi, XIE Si-yu, YANG Xue-kun, LI Hao. Drug-Drug Interaction Prediction Based on Transformer and LSTM [J]. Computer Science, 2022, 49(6A): 17-21. |
[9] | ZHAO Xiao-hu, YE Sheng, LI Xiao. Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction [J]. Computer Science, 2022, 49(6): 269-275. |
[10] | LU Liang, KONG Fang. Dialogue-based Entity Relation Extraction with Knowledge [J]. Computer Science, 2022, 49(5): 200-205. |
[11] | LI Chuan, LI Wei-hua, WANG Ying-hui, CHEN Wei, WEN Jun-ying. Gated Two-tower Transformer-based Model for Predicting Antigenicity of Influenza H1N1 [J]. Computer Science, 2022, 49(11A): 211000209-6. |
[12] | WANG Shuai, ZHANG Shu-jun, YE Kang, GUO Qi. Continuous Sign Language Recognition Method Based on Improved Transformer [J]. Computer Science, 2022, 49(11A): 211200198-6. |
[13] | HU Xin-rong, CHEN Zhi-heng, LIU Jun-ping, PENG Tao, YE Peng, ZHU Qiang. Sentiment Analysis Framework Based on Multimodal Representation Learning [J]. Computer Science, 2022, 49(11A): 210900107-6. |
[14] | HAN Hui-zhen, LIU Li-bo. Lycium Barbarum Pest Retrieval Based on Attention and Visual Semantic Reasoning [J]. Computer Science, 2022, 49(11A): 211200087-6. |
[15] | FANG Zhong-jun, ZHANG Jing, LI Dong-dong. Spatial Encoding and Multi-layer Joint Encoding Enhanced Transformer for Image Captioning [J]. Computer Science, 2022, 49(10): 151-158. |
|