计算机科学 ›› 2020, Vol. 47 ›› Issue (12): 205-209.doi: 10.11896/jsjkx.191000132

• 计算机图形学与多媒体 • 上一篇    下一篇

基于深度学习的特种车辆跨模态检索方法

邵阳雪1,2, 孟伟1,2, 孔德珍2,3, 韩林轩2,3, 刘扬1,2,3   

  1. 1 河南大学河南省空间信息处理工程实验室 河南 开封 475004
    2 河南大学计算机与信息工程学院 河南 开封 475004
    3 河南大学河南省大数据分析与处理重点实验室 河南 开封 475004
  • 收稿日期:2019-10-21 修回日期:2020-04-24 发布日期:2020-12-17
  • 通讯作者: 刘扬(ly.sci.art@gmail.com)
  • 作者简介:624535306@qq.com
  • 基金资助:
    河南省重点研发与推广专项(192102210096182102310724)

Cross-modal Retrieval Method for Special Vehicles Based on Deep Learning

SHAO Yang-xue1,2, MENG Wei1,2, KONG Deng-zhen2,3, HAN Lin-xuan2,3, LIU Yang1,2,3   

  1. 1 Henan Engineering Laboratory of Spatial Information ProcessingHenan University Kaifeng Henan 475004,China
    2 School of Computer and Information EngineeringHenan University Kaifeng Henan 475004,China
    3 Henan Key Laboratory of Big Data Analysis and ProcessingHenan University Kaifeng Henan 475004,China
  • Received:2019-10-21 Revised:2020-04-24 Published:2020-12-17
  • About author:SHAO Yang-xue,born in 1994postgraduateis a member of China Computer Federation.Her main research interests include cross-modal retrievalmachine learning and brain-like computing.
    LIU Yang,born in 1971Ph.Dassociate professorM.S.supervisor.is a member of China Computer Federation.His main research interests include brain-inspired computing (i.e.multimedia neural cognitive computingmultisource cross-modal target recognitionand audio-visual cross-media semantic retrieval)and temporal-spatial information high-performance computing in remote sensing.
  • Supported by:
    Key Research and Promotion Projects of Henan Province(192102210096,182102310724).

摘要: 保证正在执行任务的特种车辆的道路优先通行权是合理配置城市交通资源、实施和保证应急救援的前提.特种车辆的跨模态识别是实现智慧交通的重要核心技术尤其是在智能车联网尚未成熟、未来长期存在无人驾驶和有人驾驶混合交通的环境中实现无人车对正在执行任务的特种车辆进行合理避让显得尤为重要.针对无人驾驶对特种车辆识别的需求文中构建了跨模态检索与识别网络(Cross-Modal Retrievaland Recognition NetCMR2Net)提出了一种基于深度学习的特种车辆跨模态检索和识别方法.CMR2Net由两个卷积子网络和一个特征融合网络组成卷积子网络分别用于提取特种车的图像与音频特征在高层语义空间中利用相似性度量的方法进行特征匹配以达到跨模态检索和识别的目的.在特种车跨模态数据集上进行的跨模态识别实验表明所提方法对跨模态检索和识别任务具有较高的识别率甚至在缺失一种模态的场景下也可准确识别出特种车辆.本研究对于提升"城市大脑"的性能具有重要的理论指导意义对设计、实现和改善未来智慧交通具有较高的工程应用价值.

关键词: 卷积神经网络, 跨模态检索, 深度学习, 相似性度量, 小样本

Abstract: To ensure the right of way of special vehicles is the premise of rational allocation of urban traffic resourcesimplementation and guarantee of emergency rescue.The cross-modal identification of special vehicles is an important core Technology in rea-lization of intelligent transportationespecially in the environment where the Internet of Vehicles is not yet mature and there will be the long-term unmanned and manned mixed traffic in the future.To make way for the special vehicles reasonable that are performing the mission is particularly important.Aiming at the demand of driverless vehicle for special vehicle identificationthis paper constructs a cross-modal retrieval and recognition net(CMR2Net)and proposes a method of cross-modal recognition and retrieval of special vehicles based on deep learning.CMR2Net consists of two convolution sub-networks and one feature fusion network.The convolution sub-networks are used to extract the features of the image and audio of the special vehiclethen the similarity measurement method is used in the high-level semantic space to perform feature matching to achieve cross-modal retrieval and recognition.Cross-modal identification experiments performed on special vehicle cross-modal dataset show that this method performs a high recognition rate for cross-modal retrieval and recognition tasks.Furthermoreit can be accurately identified special vehicles even one modal absence.This research has major theoretical guiding significance for improving the performance of "urban brain"and also can be used in the engineering for designingrealizing and improving the smart transportation in the future.

Key words: Convolutional neural networks, Cross-modal retrieval, Deep learning, Similarity measurement, Small sample

中图分类号: 

  • TP391
[1] LIN Z H.Multimodal Deep Learning Object Detecting and Application[D].Chengdu:University of Electronic Science and Technology of China.2018.
[2] HE X,TANG Y P,CHEN P.Fast hash vehicle retrieval method based on multitasking[J].Journal of Image and Graphics,2018,23(12):1801-1812.
[3] LI X Y,NIE X S,CUI C R,et al.Image Retrieval Algorithm Based on Transfer Learning[J].Computer Science,2019,46(1):73-77.
[4] JIANG Z T,QIN J Q,HU S.Multi-spectral Scene Recognition Method Basedon Multi-way Convolution Neural Network[J].Computer Science,2019,46(9):265-270.
[5] ARANDJELOVI R,ZISSERMAN,et al.Look,Listen and Learn[J/OL].https://ui.adsabs.harvard.edu/abs/2017arXiv170508168A.
[6] RASIWASIA N,PEREIRA J C,COVIELLO E,et al.A New Approach to Cross-Modal Multimedia Retrieval[C]//International Conference on Multimedia.2010:521-535.
[7] JIAN L,RAN H,SUN Z,et al.Group-Invariant Cross-Modal Subspace Learning[C]//International Joint Conference on Artificial Intelligence.Seattle,WA,USA:IEEE Press,2016:1739-1745.
[8] SHARMA A,KUMAR A,DAUME H,et al.Generalized Multiview Analysis:A discriminative latent space[C]//IEEE Confe-rence on Computer Vision &Pattern Recognition.2012:2160-2167.
[9] NGIAM J,KHOSLA A,KIM M,et al.Multimodal deep learning[C]//International Conference on Machine Learning.Washington,USA,2011:689-696.
[10] SRIVASTAVA,NITISH,SALAKHUTDINOV,et al.Multimodal Learning with Deep Boltzmann Machines[C]//Advances in Neural Information Processing Systems.2012:2222-2230.
[11] FENG Y G.CAI GY.Cross-modal Retrieval Fusing Multilayer Semantic[J].Computer Science,2019,46(3):227-233.
[12] KAISER L,GOMEZ A N,SHAZEER N,et al.One Model To Learn Them All[J/OL].https://ui.adsabs.harvard.edu/abs/2017arXiv170605137K.
[13] AYTAR Y,VONDRICK C,TORRALBA A.See,Hear,andRead:Deep Aligned Representations[J/OL].https://ui.ad-sabs.harvard.edu/abs/2017arXiv170600932A.
[14] ARANDJELOVIC',RELJA,ZISSERMAN,et al.Look,Listen andLearn[EB/OL].https://ui.adsabs.harvard.edu/abs/2017-arXiv170508168A.
[15] HAO W,ZHANG Z,HE G.CMCGAN:A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation[C]//AAAI Conference on Artificial Intelligence (AAAI).New Orleans,LA,USA:AAAI,2018:6886-6893.
[16] LIU Y,CAI K,LIU C,et al.CSRNCVA:a Model of Cross-media Semantic Retrieval based on Neural Computing of Visual and Auditory Sensations[J].Neural Network World,2018,28(4):305-323.
[17] LIU Y,TU C L,ZHENG F B.Research of Neural Cognitive Computing Model for Visual and Auditory Cross-media Retrieval[J].Computer Science,2015,42(3):19-25,30.
[18] JIN K.H,Maccan M.T,Froustey E,et al.Deep Convolutional Neural Network for Inverse Problems in Imaging[J].IEEE Transactions on Image Procession,2016,26(9):4509-4522.
[19] LIN M,CHEN Q,YAN S.Network In Network[J/OL].https://ui.adsabs.harvard.edu/abs/2013arXiv1312.4400L.
[20] HAHNLOSER RICHARD H R,SEBASTIAN S H,JACQUES S J.Permitted and forbidden sets in symmetric threshold-li-near networks.[J].Neural Computation,2003,15(3):621-638.
[21] VAPNIK V N.Statistical Learning Theory[J].Encyclopedia of the ences of Learning,1998,41(4):3185.
[22] HAO Y,QI C.Robust virtual frontal face synthesis from a given pose usingregularized linear regression[C]//International Conference on Image Processing(ICIP).Paris:IEEE Press,2014:702-4706.
[23] LIU W,WEN Y,YU Z,et al.Large-margin softmax loss for convolutional neural networks[C]//International Conferencer on International Conference on Machine Learning.Vienna,Austria:ICML,2016:69-75.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[3] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[4] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[5] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[6] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[7] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[8] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[9] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[10] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[11] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[12] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[13] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[14] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[15] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!