Computer Science ›› 2022, Vol. 49 ›› Issue (5): 33-42.doi: 10.11896/jsjkx.210200157

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning

HAN Hong-qi1,2, RAN Ya-xin1,2, ZHANG Yun-liang1,2, GUI Jie1, GAO Xiong1,2, YI Meng-lin1,2   

  1. 1 Institute of Scientific and Technical Information of China,Beijing 100038,China
    2 Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content,National Press and Publication Administration,Beijing 100038,China
  • Received:2021-02-24 Revised:2021-07-15 Online:2022-05-15 Published:2022-05-06
  • About author:HAN Hong-qi,born in 1971,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include data mining,cross-media retrieval and knowledge engineer.
  • Supported by:
    ISTIC Key Work Project(ZD2020-09) and National Natural Science Foundation of China(71473237).

Abstract: The semantic similarity between two different media data can not be calculated directly because of the serious heterogeneous gap and semantic gap between them,which affects the implementation and effect of cross media retrieval.Although the common space learning can achieve cross media semantic association and retrieval,the retrieval performance is not satisfied.The main reason is that it uses common feature extraction technology and general classification algorithm to implement semantic correlation and match.Aiming at this problem,the study proposes a novel cross media correlation method called Stacking-DSCM-WR for cross media retrieval between documents and images.WR means that text feature extraction is based on word-embedding technique and the image feature extraction is based on ResNet technique.DSCM means that the deep semantic correlation and match technology is exploited to project data of different modalities into a common subspace.Stacking is a kind of ensemble lear-ning algorithm.It is employed to produce the distribution of text documents and images on the same high-level conceptual semantic space for cross-media retrieval.The experiments are carried out on two smaller cross-media datasets,Wikipedia and Pascal Sentence,and one larger cross-media dataset,INRIA-Websearch,respectively.The results show that the proposed method can effectively extract the features of text and image,and realize the correlation and match of cross media data in high-level semantic space.The comparisons with similar cross media retrieval methods show that the proposed method achieves the best retrieval effect based on MAP metric.

Key words: Cross-media information retrieval, Ensemble learning, Residual networks, Semantic correlation, Word embedding

CLC Number: 

  • TP391
[1]ZHAO Y,WEI S K,WANG S H.Knowledge representation incross media era:perception,relevance and consistency representation[J].Communications of the CCF,2014,10(7):8-13.
[2]WEI Y C.Semantic classification and retrieval for cross-media Data[D].Beijing:Beijing Jiaotong University,2016.
[3]PENG Y X,ZHU W W,ZHAO Y,et al.Cross-media analysisand reasoning:advances and directions[J].Frontiers of Information Technology & Electronic Engineering,2017,18(1):44-57.
[4]HUANG X,PENG Y X.Deep cross-media knowledge transfer[C]//31th IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2018:8837-8846.
[5]ZHANG B.Research on multimodal multimedia retrieval me-thod based on neural network[D].Jinan:Shandong Normal University,2018.
[6]XIE Y X,LUAN X D,WU L D.Multimedia Data Semantic Gap Analysis[J].Journal of Wuhan University of Technology(Information & Management Engineering),2011,33(6):859-863.
[7]PENG Y X,HUANG X,ZHAO Y,et al.An overview of cross-media retrieval:concepts,methodologies,benchmarks,and challenges[J].IEEE Transactions on Circuits and Systems for Video Technology,2018,28(9):2372-2385.
[8]HOTELLING H.Relations between two sets of variates[J].Biometrika,1936,28(3/4):321-377.
[9]RASIWASIA N,PEREIRA J C,COVIELLO E,et al.A new approach to cross-modal multimedia retrieval[C]//International Conference on Multimedia. New York:ACM,2010:251-260.
[10]HWANG S J,GRANMAN K.Learning the relative importance of objects from tagged images for retrieval and cross-modal search[J].International Journal of Computer Vision,2012,100(2):134-153.
[11]ANDREW G,ARORA R,BILMES J,et al.Deep canonical correlation analysis[C]//International Conference on International Conference on Machine Learning.CambridgeMA:Microtome Publishing,2013,28(3):1247-1255.
[12]RASIWASIA N,MAHAJAN D,MAHADEVAN V,et al.Cluster canonical correlation analysis[C]//Proceedings of Machine Learning Research.Reykjavik:PMLR,2014:823-831.
[13]WEI Y,ZHAO Y,LU C,et al.Cross-modal retrieval with CNNvisual features:a new baseline[J].IEEE Transactions on Cybernetics,2017,47(2):449-460.
[14]KUMAR S,UDUPA R.Learning hash functions for cross-view similarity search[C]//International Joint Conference on Artificial Intelligence.Barcelona:IJCAI,2011:1360-1365.
[15]ZHAI X H,PENG Y X,XIAO J G.Heterogeneous metric lear-ning with joint graph regularization for cross-media retrieval[C]//Web Information Systems Engineering. Heidelberg:Springer,2013:1198-1204.
[16]MESSINA N,AMATO G,ESULI A,et al.Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transfor-mer Encoders[J].ACM Transactions on Multimedia Computing,Communications,and Applications,2021,17(4):1-23.
[17]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781,2013.
[18]PENNINGTON J,SOCHER R,MANNING C.Glove:GlobalVectors for Word Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).Doha,Qatar,2014:1532-1543.
[19]JOULIN A,GRAVE E,BOJANOWSKI P,et al.Bag of tricksfor efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics:Volume 2,Short Papers.Stroudsburg:ACL,2017:427-431.
[20]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].Eprint Arxiv,2019(5):1-16.
[21]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[J].Computer Vision and Pattern Recognition,2015,19(1):51-59.
[22]FENG J,LU C Y.Cross Media Retrieval Method Based on Residual Attention Network[J].Computer Science,2021,48(6A):122-126.
[23]GAO S H,CHENG M M,ZHAO K,et al.Res2Net:a newmulti-scale backbone architecture[EB/OL].(2019-09-01)[2020-06-08].https://arxiv.org/pdf/1904.01169.pdf.
[24]CAI Y,ZHU X F,SUN Z L,et al.Semi-supervised and Ensemble Learning:A Review[J].Computer Science,2017,44(Z1):7-13.
[25]SCHWENKER F.Ensemble methods:foundations and algo-rithms[J].IEEE Computational Intelligence Magazine,2013,8(1):77-79.
[26]CHEN X.Research on cross modal multimedia retrieval method based on semantic matching[D].Jinan:Shandong Normal University,2018.
[27]WU D P,ZHANG Z L,CAO T T.Research on Stability Classifier Combination Algorithm Based on Stacking Strategy[J].Journal of Chinese Computer Systems,2019,40(5):135-139.
[28]ZHAI W J,YAN Y,ZHANG B W,et al.A Model for Text Representation and Classification Based on Hybrid Deep Belief Networks[J].Technology Intelligence Engineering,2016,2(5):30-40.
[29]RASHTCHIAN C,YOUNG P,HODOSH M,et al.Collecting image annotations using amazon’s mechanical turk[C]//Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk.Stroudsburg:ACL,2010:139-147.
[30]KRAPAC J,ALLAN M,VERBEEK J J,et al.Improving web image search results using query-relative classifiers[C]//Computer Vision & Pattern Recognition.IEEE,2010.
[31]LIU Y.Cross-modal multimedia information retrieval with CCA and Adaboost[D].Chongqing:Southwest University,2016.
[32]JI J W,PENG Y X,YUAN Y X.Cross-media retrieval withhierarchical recurrent attention network[J].Journal of Image and Graphics,2018,23(11):1751-1758.
[33]LI C X,YAN T K,LUO X,et al.Supervised Robust DiscreteMultimodal Hashing for Cross-Media Retrieval[J].IEEE Transactions on Multimedia,2019,21(11):2863-2877.
[34]WANG T,ZHANG H,LI B,et al.Semisupervised Cross-Media Retrieval by Distance-Preserving Correlation Learning and Multi-modal Manifold Regularization[C]//Pacific Rim International Conference on Artificial Intelligence.Cham:Springer,2019.
[35]WANG L.Research on Cross Media Retrieval Algorithm based on discriminative common subspace[D].Jinan:Shandong Normal University,2019.
[36]LU Y H.Semantic Modeling of Textual Relationship in Cross-Media Information Retrieval[D].Beijing:University of Chinese Academy of Sciences,2019.
[37]ZHENG S X.Research on Cross Media Retrieval Algorithmbased on Embedded Spatial Representation[D].Jinan:Shandong Normal University,2020.
[1] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[2] LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
[3] KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
[4] WANG Yu-fei, CHEN Wen. Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment [J]. Computer Science, 2022, 49(6): 127-133.
[5] LI Yu-qiang, ZHANG Wei-jiang, HUANG Yu, LI Lin, LIU Ai-hua. Improved Topic Sentiment Model with Word Embedding Based on Gaussian Distribution [J]. Computer Science, 2022, 49(2): 256-264.
[6] REN Shou-peng, LI Jin, WANG Jing-ru, YUE Kun. Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction [J]. Computer Science, 2022, 49(2): 265-271.
[7] CHEN Wei, LI Hang, LI Wei-hua. Ensemble Learning Method for Nucleosome Localization Prediction [J]. Computer Science, 2022, 49(2): 285-291.
[8] LIU Kai, ZHANG Hong-jun, CHEN Fei-qiong. Name Entity Recognition for Military Based on Domain Adaptive Embedding [J]. Computer Science, 2022, 49(1): 292-297.
[9] LI Zhao-qi, LI Ta. Query-by-Example with Acoustic Word Embeddings Using wav2vec Pretraining [J]. Computer Science, 2022, 49(1): 59-64.
[10] LIU Zhen-yu, SONG Xiao-ying. Multivariate Regression Forest for Categorical Attribute Data [J]. Computer Science, 2022, 49(1): 108-114.
[11] ZHOU Xin-min, HU Yi-gui, LIU Wen-jie, SUN Rong-jun. Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method [J]. Computer Science, 2021, 48(9): 50-58.
[12] ZHOU Gang, GUO Fu-liang. Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data [J]. Computer Science, 2021, 48(6A): 250-254.
[13] DAI Zong-ming, HU Kai, XIE Jie, GUO Ya. Ensemble Learning Algorithm Based on Intuitionistic Fuzzy Sets [J]. Computer Science, 2021, 48(6A): 270-274.
[14] YU Sheng, LI Bin, SUN Xiao-bing, BO Li-li, ZHOU Cheng. Approach for Knowledge-driven Similar Bug Report Recommendation [J]. Computer Science, 2021, 48(5): 91-98.
[15] HUAN Wen-ming, LIN Hai-tao. Design of Intrusion Detection System Based on Sampling Ensemble Algorithm [J]. Computer Science, 2021, 48(11A): 705-712.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!