Computer Science ›› 2021, Vol. 48 ›› Issue (9): 200-207.doi: 10.11896/jsjkx.200600119

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Cross-modal Retrieval Combining Deep Canonical Correlation Analysis and Adversarial Learning

LIU Li-bo, GOU Ting-ting   

  1. School of Information Engineering,Ningxia University,Yinchuan 750021,China
  • Received:2020-06-19 Revised:2020-12-04 Online:2021-09-15 Published:2021-09-10
  • About author:LIU Li-bo,born in 1974,Ph.D,professor,is a member of China Computer Federation.Her main research interests include intelligent information proces-sing and so on.
  • Supported by:
    National Natural Science Foundation of China(61862050),Scientific Research Innovation Project of First-Class Western Universities(ZKZD2017005) and Postgraduate Innovation Project of Ningxia University(GIP2019054)

Abstract: This paper proposes a cross-modal retrieval method (DCCA-ACMR) that integrates deep canonical correlation analysis and adversarial learning.The method can improve the utilization rate of unlabeled samples,learn more powerful feature projection models,and improve the accuracy of cross-modal retrieval.Specifically,under the DCGAN framework:1)depth canonical correlation analysis constraints are added between the two single-modal representation layers of image and text,to construct a graphic feature projection model,and the semantic relevance of sample pairs is exploited fully;2)the graphic feature projection model is used as a generator,and the modal feature classification model is used as a discriminator to form a graphic and text cross-modal retrieval model;3)the common subspace representation of samples is learned by using labeled samples and unlabeled samples in the confrontation between generator and discriminator.We utilize average accuracy rate (mAP) to evaluate the proposed method on the two public datsets,Wikipedia and NUSWIDE-10k.The average mAP values of image-to-text retrievaland text-image retrie-val are 0.556 and 0.563 respectively on the two datasets.Experimental results show that DCCA-ACMR method is superior to the existing representative methods.

Key words: Adversarial learning, Cross-modal retrieval, Deep canonical analysis, Deep convolution generative adversarial network

CLC Number: 

  • TP391.3
[1]WAQAS M,TU S,KOUBAA A,et al.Deep Learning Techniques for Future Intelligent Cross-Media Retrieval[J].arXiv:2008.01191,2020.
[2]HOTELLING H.Relations between two sets of variates[M]//Breakthroughs in Statistics.New York:Springer,1992:162-190.
[3]HARDOON D R,SZEDMAK S,SHAWE-TAYLOR J.Canonical Correlation Analysis:An Overview with Application to Learning Methods[J].Neural Computation,2004,16(12):2639-2664.
[4]VERMA Y,JAWAHAR C V.Im2Text and Text2Im:Associating Images and Texts for Cross-Modal Retrieval[C]//BMVC.2014:2.
[5]KLEIN B,LEV G,SADEH G,et al.Associating neural wordembeddings with deep image representations using fisher vectors[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4437-4446.
[6]RASIWASIA N,PEREIRA J C,COVIELLO E,et al.A new approach to cross-modal multimedia retrieval[C]//Proceedings of the 18th ACM International Conference on Multimedia.2010:251-260.
[7]RANJAN V,RASIWASIA N,JAWAHAR C V.Multi-labelcross-modal retrieval[C]//Proceedings of the IEEE InternationalConference on Computer Vision.2015:4094-4102.
[8]YAO T,MEI T,NGO C W.Learning query and image similarities with ranking canonical correlation analysis[C]//Procee-dings of the IEEE International Conference on Computer Vision.2015:28-36.
[9]ZUO C,FENG S J,ZHANG X Y,et al.The calculated imaging:deep learning situation,challenges and future[J].Journal of Optics,2020,40(1):45-70.
[10]WANG F,WANG H,BIAN Y M,et al.Deep learning applications in computational imaging[J].Journal of Optics,2020,40(1):31-44.
[11]WANG C,YANG H,MEINEL C.Deep semantic mapping forcross-modal retrieval[C]//2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).IEEE,2015:234-241.
[12]CASTREJON L,AYTAR Y,VONDRICK C,et al.Learningaligned cross-modal representations from weakly aligned data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2940-2949.
[13]WEI Y,ZHAO Y,LU C,et al.Cross-Modal Retrieval withCNN Visual Features:A New Baseline[J].IEEE Transactions on Cybernetics,2017,47(2):449-460.
[14]HOANG T,DO T T,NGUYEN T V,et al.Unsupervised Deep Cross-modality Spectral Hashing[J].IEEE Transactions on Image Processing,2020,29:8391-8406.
[15]GOODFELLOW I J,POUGETABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Neural Information Processing Systems.2014:2672-2680.
[16]REED S,AKATA Z,YAN X C,et al.Generative adversarialtext to image synthesis[C]// Proceedings of the 33rd International Conference on Machine Learning.New York,USA:JML,2016:1060-1069.
[17]LIANG X D,HU Z T,ZHANG H,et al.Recurrent topic-transition gan for visual paragraph generation[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:3362-3371.
[18]WANG B,YANG Y,XU X,et al.Adversarial cross-modal retrieval[C]//Proceedings of the 25th ACM International Confe-rence on Multimedia.2017:154-162.
[19]RADFORD A,METZ L,CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[J].arXiv:1511.06434,2015.
[20]ANDREW G,ARORA R,BILMES J,et al.Deep canonical correlation analysis[C]//International Conference on Machine Learning.PMLR,2013:1247-1255.
[21]GOODFELLOW I,BENGIO Y,COURVILLE A.Deep Leaning[M].Cambridge:The MIT Press,2016:26-29.
[22]ABADI M,BARHAM P,CHEN J,et al.Tensorflow:A system for large-scale machine learning[C]//12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).2016:265-283.
[23]WEI Y,ZHAO Y,ZHU Z,et al.Modality-dependent cross-media retrieval[J].ACM Transactions on Intelligent Systems and Technology (TIST),2016,7(4):1-13.
[24]CHUA T S,TANG J,HONG R,et al.NUS-WIDE:a real-world web image database from National University of Singapore[C]//Proceedings of the ACM International Conference on Image and Video Retrieval.2009:1-9.
[25]WANG K,YIN Q,WANG W,et al.A comprehensive survey on cross-modal retrieval[J].arXiv:1607.06215,2016.
[26]XUAN R S,OU W H,SONG H Q,et al.Research on the cross-modal retrieval method of semi-supervised confrontation with graph constraint[J].Journal of Guizhou Normal University (Natural Science Edition),2019,37(4):86-94.
[27]WANG K,HE R,WANG W,et al.Learning coupled feature spaces for cross-modal matching[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:2088-2095.
[28]SRIVASTAVA N,SALAKHUTDINOV R R.Multimodallearning with deep boltzmann machines[C]//Advances in Neural Information Processing Systems.2012:2222-2230.
[29]FENG F,WANG X,LI R.Cross-modal retrieval with correspondence autoencoder[C]//Proceedings of the 22nd ACM International Conference on Multimedia.2014:7-16.
[30]ZHAI X,PENG Y,XIAO J.Learning Cross-Media Joint Representation with Sparse and Semi supervised Regularization[J].IEEE Transactions on Circuits & Systems for Video Technology,2014,24(6):965-978.
[31]NGIAM J,KHOSLA A,KIM M,et al.Multimodal deep learning[C]//ICML.2011.
[32]WANG K,HE R,WANG L,et al.Joint feature selection andsubspace learning for cross-modal retrieval[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,38(10):2010-2023.
[33]LONG M,CAO Y,WANG J,et al.Composite CorrelationQuantization for Efficient Multimodal Retrieval[C]//Procee-dings of the 39th International ACM SIGIR Conference on Research and Development in Infromation Retrieval.2016:578-588.
[34]PENG Y,QI J,HUANG X,et al.CCL:Cross-modal correlationlearning with multi-grained fusion by hierarchical network[J].IEEE Trans.Multimed.,2018,20(2):405-420.
[35]SHANG F,ZHANG H,ZHU L,et al.Adversarial cross-modal retrieval based on dictionary learning[J].Neurocomputing,2019,355:93-104.
[36]HU P,PENG D,WANG X,et al.Multimodal adversarial net-work for cross-modal retrieval[J].Knowledge-Based Systems,2019,180:38-50.
[1] CAO Xiao-wen, LIANG Mei-yu, LU Kang-kang. Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model [J]. Computer Science, 2022, 49(9): 123-131.
[2] HOU Hong-xu, SUN Shuo, WU Nier. Survey of Mongolian-Chinese Neural Machine Translation [J]. Computer Science, 2022, 49(1): 31-40.
[3] FENG Xia, HU Zhi-yi, LIU Cai-hua. Survey of Research Progress on Cross-modal Retrieval [J]. Computer Science, 2021, 48(8): 13-23.
[4] WANG Sheng, ZHANG Yang-sen, CHEN Ruo-yu, XIANG Ga. Text Matching Method Based on Fine-grained Difference Features [J]. Computer Science, 2021, 48(8): 60-65.
[5] SUN Sheng-zi, GUO Bing-hui , YANG Xiao-bo. Embedding Consensus Autoencoder for Cross-modal Semantic Analysis [J]. Computer Science, 2021, 48(7): 93-98.
[6] ZHAN Wan-jiang, HONG Zhi-lin, FANG Lu-ping, WU Zhe-fu, LYU Yue-hua. Collaborative Filtering Recommendation Algorithm Based on Adversarial Learning [J]. Computer Science, 2021, 48(7): 172-177.
[7] DENG Yi-jiao, ZHANG Feng-li, CHEN Xue-qin, AI Qing, YU Su-zhe. Collaborative Attention Network Model for Cross-modal Retrieval [J]. Computer Science, 2020, 47(4): 54-59.
[8] SHAO Yang-xue, MENG Wei, KONG Deng-zhen, HAN Lin-xuan, LIU Yang. Cross-modal Retrieval Method for Special Vehicles Based on Deep Learning [J]. Computer Science, 2020, 47(12): 205-209.
[9] SUN Quan, ZENG Xiao-qin. Image Inpainting Based on Generative Adversarial Networks [J]. Computer Science, 2018, 45(12): 229-234.
[10] LIU Xiao-qin, WANG Jie-ting, QIAN Yu-hua and WANG Xiao-yue. Ensemble Method Against Evasion Attack with Different Strength of Attack [J]. Computer Science, 2018, 45(1): 34-38.
Full text



No Suggested Reading articles found!