基于残差注意力网络的跨媒体检索方法

doi:10.11896/jsjkx.201100026

摘要/Abstract

摘要： 随着多媒体技术的快速发展,跨媒体检索逐渐替代传统的单媒体检索成为主流的信息检索方式。现有跨媒体检索方法复杂度高,且不能充分挖掘数据的细节特征,在映射的过程中会产生偏移,难以学习到精准的数据关联。针对上述问题,提出了一种基于残差注意力网络的跨媒体检索方法。首先,为了更好地提取不同媒体数据的关键特征,同时简化跨媒体检索模型,提出了融入注意力机制的残差神经网络。然后,提出了跨媒体检索联合损失函数,通过约束网络的映射过程,增强网络的语义辨别能力,提高网络检索精度。实验结果表明,与现有的一些方法对比,本文提出的基于残差注意力网络的跨媒体检索方法能够较好地学习到不同媒体数据之间的关联,有效地提高了跨媒体检索的精度。

关键词: 残差神经网络, 跨媒体检索, 联合损失函数, 注意力机制

Abstract: With the rapid development of multimedia technology,cross-media retrieval has gradually replaced traditional single-media retrieval as the mainstream information retrieval method.Existing cross-media retrieval methods are highly complex,and cannot fully mine the detailed characteristics of the data,which will cause deviations in the mapping process,and it is difficult to learn accurate data associations.To solve the above problems,this paper proposes a cross-media retrieval method based onresidualattention network(CR-RAN).First of all,in order to better extract the key features of different media data and simplify the cross-media retrieval model,this paper proposes a residual neural network incorporating the attention mechanism.Then this paper proposes a cross-media retrieval joint loss function,which enhances the semantic discrimination ability of the network and improves the accuracy of network retrieval by constraining the mapping process of the network.Experimental results show that,compared with some existing methods,the cross-media retrieval method based on residual attention network proposed in this paper can better learn the association between different media data and effectively improve the accuracy of cross-media retrieval.

Key words: Attention mechanism, Cross media retrieval, Joint loss function, Residual neural network

中图分类号:

TP391

冯姣, 陆昶谕. 基于残差注意力网络的跨媒体检索方法[J]. 计算机科学, 2021, 48(6A): 122-126. https://doi.org/10.11896/jsjkx.201100026

FENG Jiao, LU Chang-yu. Cross Media Retrieval Method Based on Residual Attention Network[J]. Computer Science, 2021, 48(6A): 122-126. https://doi.org/10.11896/jsjkx.201100026

参考文献

[1] QI J W,PENG Y X,YUAN Y X.Cross-media retrieval withhierarchical recurrent attention network[J].Journal of Image and Graphics,2018,23(11):1751-1758.
[2] PENG Y X,QI J W,HUANG X.Current Research and Prospects on Multimedia Content Understanding[J].Journal of Computer Research and Development,2019,56(1):183-208.
[3] ZHUO Y K,QI J W,PENG Y X.Cross-media deep fine-grained correlation learning[J].Ruan Jian Xue Bao/Journal of Software,2019,30(4):884-895.
[4] HOTELLING H.Relation between two sets of variates [J].Biometrika,1936,28(3/4):321-377.
[5] HARDOON D R,SZEDMAK S,SHAWE-TAYLOR J.Canonical Correlation Analysis:An Overview with Application to Learning Methods[J].Neural Computation,2004,16(12):2639-2664.
[6] RASIWASIA N,PEREIRA J C,COVIELLO E,et al.A New Approach to Cross-Modal Multimedia Retrieval [C]//International Conference on Multimedia.2010:251-260.
[7] ZHANG B,HAO J,MA G,et al.Automatic image annotation based on semi-paired probabilistic canonical correlation anlysis[J].Ruan Jian Xue Bao/Journal of Software,2017,28(2):292-309.
[8] ANDREW G,ARORA R,BILMES J,et al.Deep Canonical Correlation Analysis[C]//ICML.2013.
[9] PENG Y X,HUANG X,QI J W.Cross-media shared representation by hierarchical learning with multiple deep networks[C]//IJCAI.2016.
[10] HE X,PENG Y,XIE L.A New Benchmark and Approach for Fine-grained Cross-media Retrieval[C]//FGcross Net_ACMMM 2019.2019.
[11] HE K,ZHANG X,REN S,et al.Deep residual learning for imagerecognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[12] ZHU W,WANG T Q,CHEN Y F,et al.Object-level Edge Detection Algorithm Based on Multi-scale Residual Network[J].Computer Science,2020,47(6):144-150.
[13] ZHANG Y,LI K,LI K,et al.Image Super-Resolution UsingVery Deep Residual Channel Attention Networks[J].arXiv:1807.02758,2018.
[14] LIU S,BAI L,YU T Y,et al.Cross-media Semantic Similarity Measurement Using Bi-directional Learning Ranking[J].Computer Science,2017,44(S1):84-87,118.
[15] CAI J,MENG Z B,KHAN A S,et al.Island loss for learning discriminative features in facial expression recognition[C]//Proceedings of the 13th IEEE International Conference on Automatic Face and Gesture Recognition.Los Alamitos:IEEE Computer Society Press,2018:302-309.
[16] FENG F X,WANG X J,LI R F.Cross-modal retrieval with correspondence utoencoder[C]//Proceedings of the 22nd ACM International Conference on Multimedia.Orlando,Florida,USA:ACM,2014:7-16.
[17] RASHTCHIAN C,YOUNG P,HODOSH M,et al.Collecting image annotations using Amazon's Mechanical Turk[C]//Proceeding of NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk.Los Angeles,California:ACM,2010:139-147.
[18] LIU Y,YU Z L,FU Q.Cross-media retravel method fusing with coupled dictionary learning and image regularization [J].Computer Engineering,2019,45(6):230-236.
[19] SUN Z Y.Research on Cross-media Retrival Method Based on Compression Convolutional Neural Networks[D].Wuahn:Central China Normal University,2020.

相关文章 15

[1]	周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2]	戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[3]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[4]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[5]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[6]	汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[7]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[8]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[9]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[10]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[11]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[12]	曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[13]	徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[14]	孟月波, 穆思蓉, 刘光辉, 徐胜军, 韩九强. 基于向量注意力机制GoogLeNet-GMP的行人重识别方法 Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism 计算机科学, 2022, 49(7): 142-147. https://doi.org/10.11896/jsjkx.210600198
[15]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed