计算机科学 ›› 2022, Vol. 49 ›› Issue (5): 33-42.doi: 10.11896/jsjkx.210200157

• 计算机图形学&多媒体* 上一篇    下一篇

基于共同子空间分类学习的跨媒体检索研究

韩红旗1,2, 冉亚鑫1,2, 张运良1,2, 桂婕1, 高雄1,2, 易梦琳1,2   

  1. 1 中国科学技术信息研究所 北京100038
    2 富媒体数字出版内容组织与知识服务重点实验室(国家新闻出版署) 北京100038
  • 收稿日期:2021-02-24 修回日期:2021-07-15 出版日期:2022-05-15 发布日期:2022-05-06
  • 通讯作者: 韩红旗(bithhq@163.com)
  • 基金资助:
    中国科学技术信息研究所重点工作项目(ZD2020-09);国家自然科学基金(71473237)

Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning

HAN Hong-qi1,2, RAN Ya-xin1,2, ZHANG Yun-liang1,2, GUI Jie1, GAO Xiong1,2, YI Meng-lin1,2   

  1. 1 Institute of Scientific and Technical Information of China,Beijing 100038,China
    2 Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content,National Press and Publication Administration,Beijing 100038,China
  • Received:2021-02-24 Revised:2021-07-15 Online:2022-05-15 Published:2022-05-06
  • About author:HAN Hong-qi,born in 1971,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include data mining,cross-media retrieval and knowledge engineer.
  • Supported by:
    ISTIC Key Work Project(ZD2020-09) and National Natural Science Foundation of China(71473237).

摘要: 不同媒体数据间由于存在严重的异构鸿沟和语义鸿沟,而不能直接计算它们之间的语义相似度,从而影响了跨媒体检索的实现和效果。当前提出的共同子空间学习虽能实现跨媒体语义关联和检索,但多采用一般的特征提取技术,且在语义匹配时的分类效果较差,不能有效实现跨媒体数据的高层语义关联计算,影响了检索效果。对此,提出Stacking-DSCM-WR跨媒体关联方法,用于文档和图像之间的跨媒体检索。该方法基于词向量技术形成文档的特征表示向量,通过残差网络技术抽取图像的特征表示向量,采用深度典型相关性分析技术将不同模态的数据投影到共同子空间下,然后采用Stacking集成学习算法获取文本和图像在同一高层概念语义空间上的分布,使得两种不同模态的数据可以进行语义匹配、相似性计算。在Wikipedia和Pascal Sentence两个小型跨媒体数据集和一个较大规模跨媒体数据集INRIA-Websearch上分别开展跨媒体检索实验,证实了所提方法能够有效地抽取文本和图像的特征,实现跨媒体数据在高层语义空间上的关联和匹配,与相近跨媒体检索方法在MAP指标上的对比显示,该方法能够取得较好的检索效果。

关键词: 残差网络, 词向量, 集成学习, 跨媒体信息检索, 语义关联

Abstract: The semantic similarity between two different media data can not be calculated directly because of the serious heterogeneous gap and semantic gap between them,which affects the implementation and effect of cross media retrieval.Although the common space learning can achieve cross media semantic association and retrieval,the retrieval performance is not satisfied.The main reason is that it uses common feature extraction technology and general classification algorithm to implement semantic correlation and match.Aiming at this problem,the study proposes a novel cross media correlation method called Stacking-DSCM-WR for cross media retrieval between documents and images.WR means that text feature extraction is based on word-embedding technique and the image feature extraction is based on ResNet technique.DSCM means that the deep semantic correlation and match technology is exploited to project data of different modalities into a common subspace.Stacking is a kind of ensemble lear-ning algorithm.It is employed to produce the distribution of text documents and images on the same high-level conceptual semantic space for cross-media retrieval.The experiments are carried out on two smaller cross-media datasets,Wikipedia and Pascal Sentence,and one larger cross-media dataset,INRIA-Websearch,respectively.The results show that the proposed method can effectively extract the features of text and image,and realize the correlation and match of cross media data in high-level semantic space.The comparisons with similar cross media retrieval methods show that the proposed method achieves the best retrieval effect based on MAP metric.

Key words: Cross-media information retrieval, Ensemble learning, Residual networks, Semantic correlation, Word embedding

中图分类号: 

  • TP391
[1]ZHAO Y,WEI S K,WANG S H.Knowledge representation incross media era:perception,relevance and consistency representation[J].Communications of the CCF,2014,10(7):8-13.
[2]WEI Y C.Semantic classification and retrieval for cross-media Data[D].Beijing:Beijing Jiaotong University,2016.
[3]PENG Y X,ZHU W W,ZHAO Y,et al.Cross-media analysisand reasoning:advances and directions[J].Frontiers of Information Technology & Electronic Engineering,2017,18(1):44-57.
[4]HUANG X,PENG Y X.Deep cross-media knowledge transfer[C]//31th IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2018:8837-8846.
[5]ZHANG B.Research on multimodal multimedia retrieval me-thod based on neural network[D].Jinan:Shandong Normal University,2018.
[6]XIE Y X,LUAN X D,WU L D.Multimedia Data Semantic Gap Analysis[J].Journal of Wuhan University of Technology(Information & Management Engineering),2011,33(6):859-863.
[7]PENG Y X,HUANG X,ZHAO Y,et al.An overview of cross-media retrieval:concepts,methodologies,benchmarks,and challenges[J].IEEE Transactions on Circuits and Systems for Video Technology,2018,28(9):2372-2385.
[8]HOTELLING H.Relations between two sets of variates[J].Biometrika,1936,28(3/4):321-377.
[9]RASIWASIA N,PEREIRA J C,COVIELLO E,et al.A new approach to cross-modal multimedia retrieval[C]//International Conference on Multimedia. New York:ACM,2010:251-260.
[10]HWANG S J,GRANMAN K.Learning the relative importance of objects from tagged images for retrieval and cross-modal search[J].International Journal of Computer Vision,2012,100(2):134-153.
[11]ANDREW G,ARORA R,BILMES J,et al.Deep canonical correlation analysis[C]//International Conference on International Conference on Machine Learning.CambridgeMA:Microtome Publishing,2013,28(3):1247-1255.
[12]RASIWASIA N,MAHAJAN D,MAHADEVAN V,et al.Cluster canonical correlation analysis[C]//Proceedings of Machine Learning Research.Reykjavik:PMLR,2014:823-831.
[13]WEI Y,ZHAO Y,LU C,et al.Cross-modal retrieval with CNNvisual features:a new baseline[J].IEEE Transactions on Cybernetics,2017,47(2):449-460.
[14]KUMAR S,UDUPA R.Learning hash functions for cross-view similarity search[C]//International Joint Conference on Artificial Intelligence.Barcelona:IJCAI,2011:1360-1365.
[15]ZHAI X H,PENG Y X,XIAO J G.Heterogeneous metric lear-ning with joint graph regularization for cross-media retrieval[C]//Web Information Systems Engineering. Heidelberg:Springer,2013:1198-1204.
[16]MESSINA N,AMATO G,ESULI A,et al.Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transfor-mer Encoders[J].ACM Transactions on Multimedia Computing,Communications,and Applications,2021,17(4):1-23.
[17]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781,2013.
[18]PENNINGTON J,SOCHER R,MANNING C.Glove:GlobalVectors for Word Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).Doha,Qatar,2014:1532-1543.
[19]JOULIN A,GRAVE E,BOJANOWSKI P,et al.Bag of tricksfor efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics:Volume 2,Short Papers.Stroudsburg:ACL,2017:427-431.
[20]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].Eprint Arxiv,2019(5):1-16.
[21]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[J].Computer Vision and Pattern Recognition,2015,19(1):51-59.
[22]FENG J,LU C Y.Cross Media Retrieval Method Based on Residual Attention Network[J].Computer Science,2021,48(6A):122-126.
[23]GAO S H,CHENG M M,ZHAO K,et al.Res2Net:a newmulti-scale backbone architecture[EB/OL].(2019-09-01)[2020-06-08].https://arxiv.org/pdf/1904.01169.pdf.
[24]CAI Y,ZHU X F,SUN Z L,et al.Semi-supervised and Ensemble Learning:A Review[J].Computer Science,2017,44(Z1):7-13.
[25]SCHWENKER F.Ensemble methods:foundations and algo-rithms[J].IEEE Computational Intelligence Magazine,2013,8(1):77-79.
[26]CHEN X.Research on cross modal multimedia retrieval method based on semantic matching[D].Jinan:Shandong Normal University,2018.
[27]WU D P,ZHANG Z L,CAO T T.Research on Stability Classifier Combination Algorithm Based on Stacking Strategy[J].Journal of Chinese Computer Systems,2019,40(5):135-139.
[28]ZHAI W J,YAN Y,ZHANG B W,et al.A Model for Text Representation and Classification Based on Hybrid Deep Belief Networks[J].Technology Intelligence Engineering,2016,2(5):30-40.
[29]RASHTCHIAN C,YOUNG P,HODOSH M,et al.Collecting image annotations using amazon’s mechanical turk[C]//Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk.Stroudsburg:ACL,2010:139-147.
[30]KRAPAC J,ALLAN M,VERBEEK J J,et al.Improving web image search results using query-relative classifiers[C]//Computer Vision & Pattern Recognition.IEEE,2010.
[31]LIU Y.Cross-modal multimedia information retrieval with CCA and Adaboost[D].Chongqing:Southwest University,2016.
[32]JI J W,PENG Y X,YUAN Y X.Cross-media retrieval withhierarchical recurrent attention network[J].Journal of Image and Graphics,2018,23(11):1751-1758.
[33]LI C X,YAN T K,LUO X,et al.Supervised Robust DiscreteMultimodal Hashing for Cross-Media Retrieval[J].IEEE Transactions on Multimedia,2019,21(11):2863-2877.
[34]WANG T,ZHANG H,LI B,et al.Semisupervised Cross-Media Retrieval by Distance-Preserving Correlation Learning and Multi-modal Manifold Regularization[C]//Pacific Rim International Conference on Artificial Intelligence.Cham:Springer,2019.
[35]WANG L.Research on Cross Media Retrieval Algorithm based on discriminative common subspace[D].Jinan:Shandong Normal University,2019.
[36]LU Y H.Semantic Modeling of Textual Relationship in Cross-Media Information Retrieval[D].Beijing:University of Chinese Academy of Sciences,2019.
[37]ZHENG S X.Research on Cross Media Retrieval Algorithmbased on Embedded Spatial Representation[D].Jinan:Shandong Normal University,2020.
[1] 王馨彤, 王璇, 孙知信.
基于多尺度记忆残差网络的网络流量异常检测模型
Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network
计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011
[2] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[3] 姜胜腾, 张亦弛, 罗鹏, 刘月玲, 曹阔, 赵海涛, 魏急波.
语义通信系统的性能度量指标分析
Analysis of Performance Metrics of Semantic Communication Systems
计算机科学, 2022, 49(7): 236-241. https://doi.org/10.11896/jsjkx.211200071
[4] 林夕, 陈孜卓, 王中卿.
基于不平衡数据与集成学习的属性级情感分类
Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning
计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[5] 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩.
融合Bert和图卷积的深度集成学习软件需求分类
Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution
计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[6] 高荣华, 白强, 王荣, 吴华瑞, 孙想.
改进注意力机制的多叉树网络多作物早期病害识别方法
Multi-tree Network Multi-crop Early Disease Recognition Method Based on Improved Attention Mechanism
计算机科学, 2022, 49(6A): 363-369. https://doi.org/10.11896/jsjkx.210500044
[7] 王宇飞, 陈文.
基于DECORATE集成学习与置信度评估的Tri-training算法
Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment
计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[8] 赵人行, 徐频捷, 刘瑶.
基于深度卷积残差网络的心电单导联房颤检测方法
ECG-based Atrial Fibrillation Detection Based on Deep Convolutional Residual Neural Network
计算机科学, 2022, 49(5): 186-193. https://doi.org/10.11896/jsjkx.220200002
[9] 高心悦, 田汉民.
基于改进U-Net网络的液滴分割方法
Droplet Segmentation Method Based on Improved U-Net Network
计算机科学, 2022, 49(4): 227-232. https://doi.org/10.11896/jsjkx.210300193
[10] 张红民, 李萍萍, 房晓冰, 刘宏.
改进YOLOv3网络模型的人体异常行为检测方法
Human Abnormal Behavior Detection Method Based on Improved YOLOv3 Network Model
计算机科学, 2022, 49(4): 233-238. https://doi.org/10.11896/jsjkx.210300251
[11] 刘硕, 王庚润, 彭建华, 李柯.
基于混合字词特征的中文短文本分类算法
Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words
计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027
[12] 瞿中, 陈雯.
基于空洞卷积和多特征融合的混凝土路面裂缝检测
Concrete Pavement Crack Detection Based on Dilated Convolution and Multi-features Fusion
计算机科学, 2022, 49(3): 192-196. https://doi.org/10.11896/jsjkx.210100164
[13] 任首朋, 李劲, 王静茹, 岳昆.
基于集成回归决策树的lncRNA-疾病关联预测方法
Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction
计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[14] 陈伟, 李杭, 李维华.
核小体定位预测的集成学习方法
Ensemble Learning Method for Nucleosome Localization Prediction
计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195
[15] 刘振宇, 宋晓莹.
一种可用于分类型属性数据的多变量回归森林
Multivariate Regression Forest for Categorical Attribute Data
计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!