Computer Science ›› 2021, Vol. 48 ›› Issue (8): 13-23.doi: 10.11896/jsjkx.200800165
• Database & Big Data & Data Science • Previous Articles Next Articles
FENG Xia, HU Zhi-yi, LIU Cai-hua
CLC Number:
[1]LIU J,XU C,LU H.Cross-media retrieval:state-of-the-art and open issues[J].International Journal of Multimedia Intelligence and Security,2010,1(1):33-52. [2]WANG K,YIN Q,WANG W,et al.A comprehensive survey on cross-modal retrieval[J].arXiv:1607.06215,2016. [3]SALTON G,FOX E A,WU H.Extended Boolean information retrieval[R].Cornell University,1982. [4]ZHU C Z,JÉGOU H,SATOH S.Query-adaptive asymmetrical dissimilarities for visual object retrieval[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:1705-1712. [5]AIZAWA A.An information-theoretic perspective of tf-idfmeasures[J].Information Processing & Management,2003,39(1):45-65. [6]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3(4/5):993-1022. [7]DALAL N,TRIGGS B.Histograms of oriented gradients forhuman detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.2005:886-893. [8]MISHRA A,ALAHARI K,JAWAHAR C V.Image retrievalusing textual cues[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:3040-3047. [9]ELIZALDE B,ZARAR S,RAJ B.Cross modal audio search and retrieval with joint embeddings based on text and audio[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2019:4095-4099. [10]KAMPER H,SHAKHNAROVICH G,LIVESCU K.Semantic speech retrieval with a visually grounded model of untranscribed speech[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2018:2514-2517. [11]GUO M,ZHOU C,LIU J.Jointly Learning of Visual and Auditory:A New Approach for RS Image and Audio Cross-Modal Retrieval[J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,2019,12(11):4644-4654. [12]HARDOON D R,SZEDMAK S,SHAWE-TAYLOR J.Canonical correlation analysis:An overview with application to learning methods[J].Neural Computation,2004,16(12):2639-2664. [13]RUPNIK J,SHAWE-TAYLOR J.Multi-view canonical correlation analysis[C]//Conference on Data Mining and Data Warehouses (SiKDD 2010).2010:1-4. [14]TENENBAUM J B,FREEMAN W T.Separating style and content with bilinear models[J].Neural computation,2000,12(6):1247-1283. [15]RANJAN V,RASIWASIA N,JAWAHAR C V.Multi-labelcross-modal retrieval[C]//Proceedings of the IEEE Internatio-nal Conference on Computer Vision.2015:4094-4102. [16]HWANG S J,GRAUMAN K.Learning the relative importance of objects from tagged images for retrieval and cross-modal search[J].International Journal of Computer Vision,2012,100(2):134-153. [17]JIA Y,BAI L,LIU S,et al.Semantically-enhanced kernel cano-nical correlation analysis:a multi-label cross-modal retrieval[J].Multimedia Tools and Applications,2019,78(10):13169-13188. [18]RASIWASIA N,MAHAJAN D,MAHADEVAN V,et al.Cluster canonical correlation analysis[C]//Artificial intelligence and statistics.2014:823-831. [19]ANDREW G,ARORA R,BILMES J,et al.Deep canonical correlation analysis[C]//International Conference on Machine Learning.PMLR,2013:1247-1255. [20]HU R,XU H,ROHRBACH M,et al.Natural language object retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4555-4564. [21]VO N,JIANG L,SUN C,et al.Composing text and image for image retrieval-an empirical odyssey[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6439-6448. [22]WEHRMANN J,BARROS R C.Bidirectional retrieval madesimple[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7718-7726. [23]SALVADOR A,HYNES N,AYTAR Y,et al.Learning cross-modal embeddings for cooking recipes and food images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:3020-3028. [24]YAMAGUCHI M,SAITO K,USHIKU Y,et al.Spatio-temporal person retrieval via natural language queries[C]//Procee-dings of the IEEE International Conference on Computer Vision.2017:1453-1462. [25]HERSHEY S,CHAUDHURI S,ELLIS D P W,et al.CNN architectures for large-scale audio classification[C]//2017 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2017:131-135. [26]HU D,NIE F,LI X.Deep multimodal clustering for unsupervised audiovisual learning[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2019:9248-9257. [27]SCHWARTZ I,SCHWING A G,HAZAN T.A simple baseline for audio-visual scene-aware dialog[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:12548-12558. [28]DENG Y J,ZHANG F L,CHEN X Q,et al.Collaborative attention network model for cross-modal retrieval[J].Computer Science,2020,47(4):54-59. [29]LI S,XIAO T,LI H,et al.Person search with natural language description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1970-1979. [30]LI H,WANG P,SHEN C,et al.Visual Question Answering as Reading Comprehension[C]//Proceedings of the IEEEConfe-rence on Computer Vision and Pattern Recognition.2019:6319-6328. [31]DEY S,DUTTA A,GHOSH S K,et al.Learning cross-modal deep embeddings for multi-object image retrieval using text and sketch[C]//2018 24th International Conference on Pattern Re-cognition(ICPR).IEEE,2018:916-921. [32]YAN Y,ZHANG Q,NI B,et al.Learning Context Graph forPerson Search[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:2158-2167. [33]MITHUN N C,PAUL S,ROY-CHOWDHURY A K.Weakly supervised video moment retrieval from text queries[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:11592-11601. [34]SONG Y,SOLEYMANI M.Polysemous Visual-Semantic Em-bedding for Cross-Modal Retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:1979-1988. [35]CHEN K,BUI T,FANG C,et al.AMC:Attention guided multi-modal correlation learning for image search[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2644-2652. [36]LIU X,WANG Z,SHAO J,et al.Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:1950-1959. [37]JOHNSON J,KRISHNA R,STARK M,et al.Image retrieval using scene graphs[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3668-3678. [38]YANG J,LU J,LEE S,et al.Graph r-cnn for scene graph genera-tion[C]//Proceedings of the European Conference on Compu-ter Vision (ECCV).2018:670-685. [39]HU R,ANDREAS J,ROHRBACH M,et al.Learning to reason:End-to-end module networks for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:804-813. [40]JOHNSON J,HARIHARAN B,VAN DER MAATEN L,et al.Inferring and executing programs for visual reasoning[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2989-2998. [41]XIAO F Y,SIGAL L,LEE Y J.Weakly-supervised visualgrounding of phrases with linguistic structures[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5945-5954. [42]LIU B,YEUNG S,CHOU E,et al.Temporal modular networks for retrieving complex compositional activities in videos[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:552-568. [43]ZHANG D,DAI X,WANG X,et al.Man:Moment alignment network for natural language moment retrieval via iterative graph adjustment[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:1247-1257. [44]WANG B,YANG Y,XU X,et al.Adversarial cross-modal retrieval[C]//Proceedings of the 25th ACM InternationalConfe-rence on Multimedia.ACM,2017:154-162. [45]PENG Y,QI J.CM-GANs:cross-modal generative adversarial networks for common representation learning[J].ACM Tran-sactions on Multimedia Computing,Communications,and Applications (TOMM),2019,15(1):22. [46]WANG H,SAHOO D,LIU C,et al.Learning Cross-Modal Embeddings with Adversarial Networks for Cooking Recipes and Food Images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:11572-11581. [47]CHEN Y,CHEN H K.Speaker recognition based on multimodal generation adversarial networks and triple loss [J].Journal of Electronics Information Technology,2020,42(2):379-385. [48]GU J,CAI J,JOTY S R,et al.Look,imagine and match:Improving textual-visual cross-modal retrieval with generative models[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7181-7189. [49]ZHU B,NGO C W,CHEN J,et al.R2GAN:Cross-Modal Recipe Retrieval with Generative Adversarial Network[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:11477-11486. [50]WEISS Y,TORRALBA A,FERGUS R.Spectral hashing[C]//Advances in Neural Information Processing Systems.2009:1753-1760. [51]LIU W,WANG J,JI R,et al.Supervised hashing with kernels[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2012:2074-2081. [52]LIU Y Y,LIU H Z,YUAN J Z.Video Hashing AlgorithmBased on 3D Convolutional Neural Network [J].Application Research of Computers,2020,37(3):887-890,900. [53]PAN Y,YAO T,LI H,et al.Semi-supervised hashing with semantic confidence for large scale visual search[C]//Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2015:53-62. [54]WANG J,KUMAR S,CHANG S F.Semi-supervised hashing for large-scale search[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(12):2393-2406. [55]SALAKHUTDINOV R,HINTON G.Semantic hashing[J].International Journal of Approximate Reasoning,2009,50(7):969-978. [56]XIA R,PAN Y,LAI H,et al.Supervised hashing for image retrieval via image representation learning[C]//Twenty-eighth AAAI Conference on Artificial Intelligence.2014. [57]LIONG V E,LU J,TAN Y P,et al.Cross-modal deep variatio-nal hashing[C]//2017 IEEE International Conference on Computer Vision (ICCV).IEEE,2017:4097-4105. [58]DONG Z,PEI M T.Cross-modal face retrieval method based on heterogeneous hash network[J].Chinese Journal of Compu-ters,2019,42(1):75-86. [59]DAI Q,LI J,WANG J,et al.Binary optimized hashing[C]//Proceedings of the 24th ACM International Conference on Multimedia.ACM,2016:1247-1256. [60]LONG F,YAO T,DAI Q,et al.Deep domain adaptation hashing with adversarial learning[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval.ACM,2018:725-734. [61]YAO T,LONG F,MEI T,et al.Deep Semantic-Preserving and Ranking-Based Hashing for Image Retrieval[C]//IJCAI.2016:3931-3937. [62]QIU Z,PAN Y,YAO T,et al.Deep semantic hashing with ge-nerative adversarial networks[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2017:225-234. [63]WU D,DAI Q,LIU J,et al.Deep Incremental Hashing Network for Efficient Image Retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:9069-9077. [64]JIANG Q Y,LI W J.Asymmetric deep supervised hashing[C]//Thirty-Second AAAI Conference on Artificial Intelligence.2018. [65]PANYAPANUWAT P,KAMONSANTIROJ S.PerformanceComparison of Unsupervised Deep Hashing with Data-indepen-dent Hashing for Content-Based Audio Retrieval[C]//Procee-dings of the 2019 2nd International Conference on Electronics,Communications and Control Engineering.2019:16-20. [66]ARIN J,BISWAS A,OFLI F,et al.Recipe1m+:A dataset for learning cross-modal embeddings for cooking recipes and food images[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43(1):187-203. [67]XIAO T,LI S,WANG B,et al.End-to-end deep learning forperson search[J].arXiv:1604.01850. [68]PEREIRA J C,COVIELLO E,DOYLE G,et al.On the role of correlation and abstraction in cross-modal multimedia retrieval[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,36(3):521-535. [69]CHUA T S,TANG J,HONG R,et al.NUS-WIDE:a real-world web image database from National University of Singapore[C]//Proceedings of the ACM international Conference on Ima-ge and Video Retrieval.ACM,2009:48. [70]RASHTCHIAN C,YOUNG P,HODOSH M,et al.Collecting image annotations using Amazon’s Mechanical Turk[C]//Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk.Association for Computational Linguistics,2010:139-147. [71]ZHENG L,ZHANG H,SUN S,et al.Person re-identification in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1367-1376. [72]HENDRICKS L A,WANG O,SHECHTMAN E,et al.Localizing moments in video with natural language[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5803-5812. [73]GAO J,SUN C,YANG Z,et al.Tall:Temporal activity localization via language query[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5267-5275. [74]ZENG D,YU Y,OYAMA K.Audio-Visual Embedding forCross-Modal Music Video Retrieval through Supervised Deep CCA[C]//2018 IEEE International Symposium on Multimedia (ISM).IEEE,2018:143-150. [75]ZHOU Y,WANG Z,FANG C,et al.Visual to sound:Generating natural sound for videos in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:3550-3558. [76]XU H,HE K,SIGAL L,et al.Text-to-clip video retrieval with early fusion and re-captioning[J].arXiv:1804.05113. [77]XU X,HE L,LU H,et al.Deep adversarial metric learning for cross-modal retrieval[J].World Wide Web,2019,22(2):657-672. [78]PENG Y,QI J,HUANG X,et al.CCL:Cross-modal correlation learning with multigrained fusion by hierarchical network[J].IEEE Transactions on Multimedia,2017,20(2):405-420. [79]ZHEN L,HU P,WANG X,et al.Deep supervised cross-modal retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:10394-10403. [80]LIU X,HU Z,LING H,et al.MTFH:A matrix tri-factorization hashing framework for efficient cross-modal retrieval[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43(3):964-981. [81]CAO W,LIN Q,HE Z,et al.Hybrid representation learning for cross-modal retrieval[J].Neurocomputing,2019,345:45-57. |
[1] | XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171. |
[2] | RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207. |
[3] | TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305. |
[4] | SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177. |
[5] | WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293. |
[6] | HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329. |
[7] | JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335. |
[8] | ZHANG Yuan, KANG Le, GONG Zhao-hui, ZHANG Zhi-hong. Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM [J]. Computer Science, 2022, 49(7): 31-39. |
[9] | HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78. |
[10] | ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112. |
[11] | CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126. |
[12] | HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163. |
[13] | ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169. |
[14] | SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235. |
[15] | WANG Jun-feng, LIU Fan, YANG Sai, LYU Tan-yue, CHEN Zhi-yu, XU Feng. Dam Crack Detection Based on Multi-source Transfer Learning [J]. Computer Science, 2022, 49(6A): 319-324. |
|