计算机科学 ›› 2022, Vol. 49 ›› Issue (5): 33-42.doi: 10.11896/jsjkx.210200157
韩红旗1,2, 冉亚鑫1,2, 张运良1,2, 桂婕1, 高雄1,2, 易梦琳1,2
HAN Hong-qi1,2, RAN Ya-xin1,2, ZHANG Yun-liang1,2, GUI Jie1, GAO Xiong1,2, YI Meng-lin1,2
摘要: 不同媒体数据间由于存在严重的异构鸿沟和语义鸿沟,而不能直接计算它们之间的语义相似度,从而影响了跨媒体检索的实现和效果。当前提出的共同子空间学习虽能实现跨媒体语义关联和检索,但多采用一般的特征提取技术,且在语义匹配时的分类效果较差,不能有效实现跨媒体数据的高层语义关联计算,影响了检索效果。对此,提出Stacking-DSCM-WR跨媒体关联方法,用于文档和图像之间的跨媒体检索。该方法基于词向量技术形成文档的特征表示向量,通过残差网络技术抽取图像的特征表示向量,采用深度典型相关性分析技术将不同模态的数据投影到共同子空间下,然后采用Stacking集成学习算法获取文本和图像在同一高层概念语义空间上的分布,使得两种不同模态的数据可以进行语义匹配、相似性计算。在Wikipedia和Pascal Sentence两个小型跨媒体数据集和一个较大规模跨媒体数据集INRIA-Websearch上分别开展跨媒体检索实验,证实了所提方法能够有效地抽取文本和图像的特征,实现跨媒体数据在高层语义空间上的关联和匹配,与相近跨媒体检索方法在MAP指标上的对比显示,该方法能够取得较好的检索效果。
中图分类号:
[1]ZHAO Y,WEI S K,WANG S H.Knowledge representation incross media era:perception,relevance and consistency representation[J].Communications of the CCF,2014,10(7):8-13. [2]WEI Y C.Semantic classification and retrieval for cross-media Data[D].Beijing:Beijing Jiaotong University,2016. [3]PENG Y X,ZHU W W,ZHAO Y,et al.Cross-media analysisand reasoning:advances and directions[J].Frontiers of Information Technology & Electronic Engineering,2017,18(1):44-57. [4]HUANG X,PENG Y X.Deep cross-media knowledge transfer[C]//31th IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2018:8837-8846. [5]ZHANG B.Research on multimodal multimedia retrieval me-thod based on neural network[D].Jinan:Shandong Normal University,2018. [6]XIE Y X,LUAN X D,WU L D.Multimedia Data Semantic Gap Analysis[J].Journal of Wuhan University of Technology(Information & Management Engineering),2011,33(6):859-863. [7]PENG Y X,HUANG X,ZHAO Y,et al.An overview of cross-media retrieval:concepts,methodologies,benchmarks,and challenges[J].IEEE Transactions on Circuits and Systems for Video Technology,2018,28(9):2372-2385. [8]HOTELLING H.Relations between two sets of variates[J].Biometrika,1936,28(3/4):321-377. [9]RASIWASIA N,PEREIRA J C,COVIELLO E,et al.A new approach to cross-modal multimedia retrieval[C]//International Conference on Multimedia. New York:ACM,2010:251-260. [10]HWANG S J,GRANMAN K.Learning the relative importance of objects from tagged images for retrieval and cross-modal search[J].International Journal of Computer Vision,2012,100(2):134-153. [11]ANDREW G,ARORA R,BILMES J,et al.Deep canonical correlation analysis[C]//International Conference on International Conference on Machine Learning.CambridgeMA:Microtome Publishing,2013,28(3):1247-1255. [12]RASIWASIA N,MAHAJAN D,MAHADEVAN V,et al.Cluster canonical correlation analysis[C]//Proceedings of Machine Learning Research.Reykjavik:PMLR,2014:823-831. [13]WEI Y,ZHAO Y,LU C,et al.Cross-modal retrieval with CNNvisual features:a new baseline[J].IEEE Transactions on Cybernetics,2017,47(2):449-460. [14]KUMAR S,UDUPA R.Learning hash functions for cross-view similarity search[C]//International Joint Conference on Artificial Intelligence.Barcelona:IJCAI,2011:1360-1365. [15]ZHAI X H,PENG Y X,XIAO J G.Heterogeneous metric lear-ning with joint graph regularization for cross-media retrieval[C]//Web Information Systems Engineering. Heidelberg:Springer,2013:1198-1204. [16]MESSINA N,AMATO G,ESULI A,et al.Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transfor-mer Encoders[J].ACM Transactions on Multimedia Computing,Communications,and Applications,2021,17(4):1-23. [17]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781,2013. [18]PENNINGTON J,SOCHER R,MANNING C.Glove:GlobalVectors for Word Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).Doha,Qatar,2014:1532-1543. [19]JOULIN A,GRAVE E,BOJANOWSKI P,et al.Bag of tricksfor efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics:Volume 2,Short Papers.Stroudsburg:ACL,2017:427-431. [20]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].Eprint Arxiv,2019(5):1-16. [21]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[J].Computer Vision and Pattern Recognition,2015,19(1):51-59. [22]FENG J,LU C Y.Cross Media Retrieval Method Based on Residual Attention Network[J].Computer Science,2021,48(6A):122-126. [23]GAO S H,CHENG M M,ZHAO K,et al.Res2Net:a newmulti-scale backbone architecture[EB/OL].(2019-09-01)[2020-06-08].https://arxiv.org/pdf/1904.01169.pdf. [24]CAI Y,ZHU X F,SUN Z L,et al.Semi-supervised and Ensemble Learning:A Review[J].Computer Science,2017,44(Z1):7-13. [25]SCHWENKER F.Ensemble methods:foundations and algo-rithms[J].IEEE Computational Intelligence Magazine,2013,8(1):77-79. [26]CHEN X.Research on cross modal multimedia retrieval method based on semantic matching[D].Jinan:Shandong Normal University,2018. [27]WU D P,ZHANG Z L,CAO T T.Research on Stability Classifier Combination Algorithm Based on Stacking Strategy[J].Journal of Chinese Computer Systems,2019,40(5):135-139. [28]ZHAI W J,YAN Y,ZHANG B W,et al.A Model for Text Representation and Classification Based on Hybrid Deep Belief Networks[J].Technology Intelligence Engineering,2016,2(5):30-40. [29]RASHTCHIAN C,YOUNG P,HODOSH M,et al.Collecting image annotations using amazon’s mechanical turk[C]//Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk.Stroudsburg:ACL,2010:139-147. [30]KRAPAC J,ALLAN M,VERBEEK J J,et al.Improving web image search results using query-relative classifiers[C]//Computer Vision & Pattern Recognition.IEEE,2010. [31]LIU Y.Cross-modal multimedia information retrieval with CCA and Adaboost[D].Chongqing:Southwest University,2016. [32]JI J W,PENG Y X,YUAN Y X.Cross-media retrieval withhierarchical recurrent attention network[J].Journal of Image and Graphics,2018,23(11):1751-1758. [33]LI C X,YAN T K,LUO X,et al.Supervised Robust DiscreteMultimodal Hashing for Cross-Media Retrieval[J].IEEE Transactions on Multimedia,2019,21(11):2863-2877. [34]WANG T,ZHANG H,LI B,et al.Semisupervised Cross-Media Retrieval by Distance-Preserving Correlation Learning and Multi-modal Manifold Regularization[C]//Pacific Rim International Conference on Artificial Intelligence.Cham:Springer,2019. [35]WANG L.Research on Cross Media Retrieval Algorithm based on discriminative common subspace[D].Jinan:Shandong Normal University,2019. [36]LU Y H.Semantic Modeling of Textual Relationship in Cross-Media Information Retrieval[D].Beijing:University of Chinese Academy of Sciences,2019. [37]ZHENG S X.Research on Cross Media Retrieval Algorithmbased on Embedded Spatial Representation[D].Jinan:Shandong Normal University,2020. |
[1] | 王馨彤, 王璇, 孙知信. 基于多尺度记忆残差网络的网络流量异常检测模型 Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network 计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011 |
[2] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[3] | 姜胜腾, 张亦弛, 罗鹏, 刘月玲, 曹阔, 赵海涛, 魏急波. 语义通信系统的性能度量指标分析 Analysis of Performance Metrics of Semantic Communication Systems 计算机科学, 2022, 49(7): 236-241. https://doi.org/10.11896/jsjkx.211200071 |
[4] | 林夕, 陈孜卓, 王中卿. 基于不平衡数据与集成学习的属性级情感分类 Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning 计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205 |
[5] | 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩. 融合Bert和图卷积的深度集成学习软件需求分类 Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution 计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065 |
[6] | 高荣华, 白强, 王荣, 吴华瑞, 孙想. 改进注意力机制的多叉树网络多作物早期病害识别方法 Multi-tree Network Multi-crop Early Disease Recognition Method Based on Improved Attention Mechanism 计算机科学, 2022, 49(6A): 363-369. https://doi.org/10.11896/jsjkx.210500044 |
[7] | 王宇飞, 陈文. 基于DECORATE集成学习与置信度评估的Tri-training算法 Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment 计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043 |
[8] | 赵人行, 徐频捷, 刘瑶. 基于深度卷积残差网络的心电单导联房颤检测方法 ECG-based Atrial Fibrillation Detection Based on Deep Convolutional Residual Neural Network 计算机科学, 2022, 49(5): 186-193. https://doi.org/10.11896/jsjkx.220200002 |
[9] | 高心悦, 田汉民. 基于改进U-Net网络的液滴分割方法 Droplet Segmentation Method Based on Improved U-Net Network 计算机科学, 2022, 49(4): 227-232. https://doi.org/10.11896/jsjkx.210300193 |
[10] | 张红民, 李萍萍, 房晓冰, 刘宏. 改进YOLOv3网络模型的人体异常行为检测方法 Human Abnormal Behavior Detection Method Based on Improved YOLOv3 Network Model 计算机科学, 2022, 49(4): 233-238. https://doi.org/10.11896/jsjkx.210300251 |
[11] | 刘硕, 王庚润, 彭建华, 李柯. 基于混合字词特征的中文短文本分类算法 Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words 计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027 |
[12] | 瞿中, 陈雯. 基于空洞卷积和多特征融合的混凝土路面裂缝检测 Concrete Pavement Crack Detection Based on Dilated Convolution and Multi-features Fusion 计算机科学, 2022, 49(3): 192-196. https://doi.org/10.11896/jsjkx.210100164 |
[13] | 任首朋, 李劲, 王静茹, 岳昆. 基于集成回归决策树的lncRNA-疾病关联预测方法 Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction 计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132 |
[14] | 陈伟, 李杭, 李维华. 核小体定位预测的集成学习方法 Ensemble Learning Method for Nucleosome Localization Prediction 计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195 |
[15] | 刘振宇, 宋晓莹. 一种可用于分类型属性数据的多变量回归森林 Multivariate Regression Forest for Categorical Attribute Data 计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189 |
|