Computer Science ›› 2020, Vol. 47 ›› Issue (12): 205-209.doi: 10.11896/jsjkx.191000132

Previous Articles     Next Articles

Cross-modal Retrieval Method for Special Vehicles Based on Deep Learning

SHAO Yang-xue1,2, MENG Wei1,2, KONG Deng-zhen2,3, HAN Lin-xuan2,3, LIU Yang1,2,3   

  1. 1 Henan Engineering Laboratory of Spatial Information ProcessingHenan University Kaifeng Henan 475004,China
    2 School of Computer and Information EngineeringHenan University Kaifeng Henan 475004,China
    3 Henan Key Laboratory of Big Data Analysis and ProcessingHenan University Kaifeng Henan 475004,China
  • Received:2019-10-21 Revised:2020-04-24 Published:2020-12-17
  • About author:SHAO Yang-xue,born in 1994postgraduateis a member of China Computer Federation.Her main research interests include cross-modal retrievalmachine learning and brain-like computing.
    LIU Yang,born in 1971Ph.Dassociate professorM.S.supervisor.is a member of China Computer Federation.His main research interests include brain-inspired computing (i.e.multimedia neural cognitive computingmultisource cross-modal target recognitionand audio-visual cross-media semantic retrieval)and temporal-spatial information high-performance computing in remote sensing.
  • Supported by:
    Key Research and Promotion Projects of Henan Province(192102210096,182102310724).

Abstract: To ensure the right of way of special vehicles is the premise of rational allocation of urban traffic resourcesimplementation and guarantee of emergency rescue.The cross-modal identification of special vehicles is an important core Technology in rea-lization of intelligent transportationespecially in the environment where the Internet of Vehicles is not yet mature and there will be the long-term unmanned and manned mixed traffic in the future.To make way for the special vehicles reasonable that are performing the mission is particularly important.Aiming at the demand of driverless vehicle for special vehicle identificationthis paper constructs a cross-modal retrieval and recognition net(CMR2Net)and proposes a method of cross-modal recognition and retrieval of special vehicles based on deep learning.CMR2Net consists of two convolution sub-networks and one feature fusion network.The convolution sub-networks are used to extract the features of the image and audio of the special vehiclethen the similarity measurement method is used in the high-level semantic space to perform feature matching to achieve cross-modal retrieval and recognition.Cross-modal identification experiments performed on special vehicle cross-modal dataset show that this method performs a high recognition rate for cross-modal retrieval and recognition tasks.Furthermoreit can be accurately identified special vehicles even one modal absence.This research has major theoretical guiding significance for improving the performance of "urban brain"and also can be used in the engineering for designingrealizing and improving the smart transportation in the future.

Key words: Convolutional neural networks, Cross-modal retrieval, Deep learning, Similarity measurement, Small sample

CLC Number: 

  • TP391
[1] LIN Z H.Multimodal Deep Learning Object Detecting and Application[D].Chengdu:University of Electronic Science and Technology of China.2018.
[2] HE X,TANG Y P,CHEN P.Fast hash vehicle retrieval method based on multitasking[J].Journal of Image and Graphics,2018,23(12):1801-1812.
[3] LI X Y,NIE X S,CUI C R,et al.Image Retrieval Algorithm Based on Transfer Learning[J].Computer Science,2019,46(1):73-77.
[4] JIANG Z T,QIN J Q,HU S.Multi-spectral Scene Recognition Method Basedon Multi-way Convolution Neural Network[J].Computer Science,2019,46(9):265-270.
[5] ARANDJELOVI R,ZISSERMAN,et al.Look,Listen and Learn[J/OL].https://ui.adsabs.harvard.edu/abs/2017arXiv170508168A.
[6] RASIWASIA N,PEREIRA J C,COVIELLO E,et al.A New Approach to Cross-Modal Multimedia Retrieval[C]//International Conference on Multimedia.2010:521-535.
[7] JIAN L,RAN H,SUN Z,et al.Group-Invariant Cross-Modal Subspace Learning[C]//International Joint Conference on Artificial Intelligence.Seattle,WA,USA:IEEE Press,2016:1739-1745.
[8] SHARMA A,KUMAR A,DAUME H,et al.Generalized Multiview Analysis:A discriminative latent space[C]//IEEE Confe-rence on Computer Vision &Pattern Recognition.2012:2160-2167.
[9] NGIAM J,KHOSLA A,KIM M,et al.Multimodal deep learning[C]//International Conference on Machine Learning.Washington,USA,2011:689-696.
[10] SRIVASTAVA,NITISH,SALAKHUTDINOV,et al.Multimodal Learning with Deep Boltzmann Machines[C]//Advances in Neural Information Processing Systems.2012:2222-2230.
[11] FENG Y G.CAI GY.Cross-modal Retrieval Fusing Multilayer Semantic[J].Computer Science,2019,46(3):227-233.
[12] KAISER L,GOMEZ A N,SHAZEER N,et al.One Model To Learn Them All[J/OL].https://ui.adsabs.harvard.edu/abs/2017arXiv170605137K.
[13] AYTAR Y,VONDRICK C,TORRALBA A.See,Hear,andRead:Deep Aligned Representations[J/OL].https://ui.ad-sabs.harvard.edu/abs/2017arXiv170600932A.
[14] ARANDJELOVIC',RELJA,ZISSERMAN,et al.Look,Listen andLearn[EB/OL].https://ui.adsabs.harvard.edu/abs/2017-arXiv170508168A.
[15] HAO W,ZHANG Z,HE G.CMCGAN:A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation[C]//AAAI Conference on Artificial Intelligence (AAAI).New Orleans,LA,USA:AAAI,2018:6886-6893.
[16] LIU Y,CAI K,LIU C,et al.CSRNCVA:a Model of Cross-media Semantic Retrieval based on Neural Computing of Visual and Auditory Sensations[J].Neural Network World,2018,28(4):305-323.
[17] LIU Y,TU C L,ZHENG F B.Research of Neural Cognitive Computing Model for Visual and Auditory Cross-media Retrieval[J].Computer Science,2015,42(3):19-25,30.
[18] JIN K.H,Maccan M.T,Froustey E,et al.Deep Convolutional Neural Network for Inverse Problems in Imaging[J].IEEE Transactions on Image Procession,2016,26(9):4509-4522.
[19] LIN M,CHEN Q,YAN S.Network In Network[J/OL].https://ui.adsabs.harvard.edu/abs/2013arXiv1312.4400L.
[20] HAHNLOSER RICHARD H R,SEBASTIAN S H,JACQUES S J.Permitted and forbidden sets in symmetric threshold-li-near networks.[J].Neural Computation,2003,15(3):621-638.
[21] VAPNIK V N.Statistical Learning Theory[J].Encyclopedia of the ences of Learning,1998,41(4):3185.
[22] HAO Y,QI C.Robust virtual frontal face synthesis from a given pose usingregularized linear regression[C]//International Conference on Image Processing(ICIP).Paris:IEEE Press,2014:702-4706.
[23] LIU W,WEN Y,YU Z,et al.Large-margin softmax loss for convolutional neural networks[C]//International Conferencer on International Conference on Machine Learning.Vienna,Austria:ICML,2016:69-75.
[1] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[3] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[4] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[5] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[6] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[7] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[8] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[9] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[10] CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[11] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[12] ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[13] SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.
[14] ZHU Wen-tao, LAN Xian-chao, LUO Huan-lin, YUE Bing, WANG Yang. Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN [J]. Computer Science, 2022, 49(6A): 378-383.
[15] WANG Jian-ming, CHEN Xiang-yu, YANG Zi-zhong, SHI Chen-yang, ZHANG Yu-hang, QIAN Zheng-kun. Influence of Different Data Augmentation Methods on Model Recognition Accuracy [J]. Computer Science, 2022, 49(6A): 418-423.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!