计算机科学 ›› 2018, Vol. 45 ›› Issue (3): 23-28.doi: 10.11896/j.issn.1002-137X.2018.03.004
毛典辉,薛子育,李子沁,王帆
MAO Dian-hui, XUE Zi-yu, LI Zi-qin and WANG Fan
摘要: 在当前大数据时代,图像由于具有丰富的语义而成为大众获取相关信息的重要来源。基于深度模型的图像语义分析是一种通过深度模型将图像内容转换成可直观理解的语义知识的技术,受到了国内外研究者的广泛关注。该技术根据生成目标语义层次的差异,可分为单类别、多标签和语句3类。首先介绍了以上3类方法对应的深度模型的结构特点,并从技术的演化趋势角度对比分析了3类方法的技术特点和发展现状;然后重点对图像语句转换方法的发展现状、应用场景与性能要求的差异进行了论述,同时对图像语句转换方法的步骤进行分解和论述,从学术界和产业界两方面进行了详细的对比分析,指出了二者的不同研究侧重点与对应的发展现状;最后对具有深度模型的图像语句转换方法进行了总结和展望,指明了该方法当前存在的问题与发展趋势。
[1] HUANG K Q,REN W Q,TAN T N.A Review on Image Object Classification and Detection[J].Journal of Computers,2014,7(6):1225-1240.(in Chinese) 黄凯奇,任伟强,谭铁牛.图像物体分类与检测算法综述[J].计算机学报,2014,37(6):1225-1240. [2] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]∥CVPR’14.IEEE,2014:580-587. [3] DOUZE M,SANDHAWALIA H,AMSALEG L.Evaluation of GIST descriptors for web-scale image search[C]∥CIVR’09.ACM,2009,19:1-8. [4] LIAO Y F,HONG W T,WANG W J,et al.An overview of RNN-based mandarin speech recognition approaches[J].Journal of the Chinese Institute of Engineers,1999,22(5):535-547. [5] CHO K,MERRIENBOER B V,GULCEHRE C,et al.Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[J].Computer Science,2014:1-15. [6] DENG J,DONG W,SOCHER R,et al.ImageNet:A large-scale hierarchical image database[C]∥Computer Vision and Pattern Recognition,2009.IEEE,2009:248-255. [7] GRUBINGER M,CLOUGH P,MLLER H,et al.The IAPRTC12 Benchmark:A New Evaluation Resource for Visual Information Systems[C]∥International Workshop OntoImage.2006:1-11. [8] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[C]∥ International Conference on Advances in Neural Information Processing Systems.Curran Associates Inc.,2012:1097-1105. [9] SUTSKEVER I,VINYALS O,LE Q V,et al.Sequence to Sequence Learning with Neural Networks[J].Advances in Neural Information Processing Systems,2014,4:3104-3112. [10] UIJLINGS J R,SANDE K E,GEVERS T,et al.Selective Search for Object Recognition[J].International Journal of Computer Vision,2013,104(2):154-171. [11] JOACHIMS T.Making Large-Scale SVM Learning Practical[R].Technical Report,SFB 475:Komplexittsreduktion in Multiva-riaten Datenstrukturen,Universitt Dortmund,1998. [12] HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].Computer Science,2012,3(4):212-223. [13] NOROUZI M,MIKOLOV T,BENGIO S,et al.Zero-Shot Lear-ning by Convex Combination of Semantic Embeddings[J].arXiv Preprint arXiv:1312.5650,2013. [14] HODOSH M,YOUNG P,HOCKENMAIER J.Framing image description as a ranking task:data,models and evaluation metrics[J].Journal of Artificial Intelligence Research,2013,47(1):853-899. [15] YOUNG P,LAI A,HODOSH M,et al.From image descriptions to visual denotations:New similarity metrics for semantic inference over event descriptions[J].Transactions of the Association for Computational Linguistics,2014,2:67-78. [16] KIROS R,SALAKHUTDINOV R,ZEMEL R S.Unifying Vi-sual-Semantic Embeddings with Multimodal Neural Language Models[J].arXiv Preprint arXiv:1411.2539,2014. [17] YANG H,ZHOU J T,ZHANG Y,et al.Exploit Bounding Box Annotations for Multi-label Object Recognition[C]∥The IEEE Conference on Computer Vision and Pattern Recognition.2016:280-288. [18] YANG F,CHOI W,LIN Y Q.Exploit All the Layers:Fast and Accurate CNN Object Detector With Scale Dependent Pooling and Cascaded Rejection Classifiers[C]∥Computer Vision and Pattern Recognition.IEEE,2016:2129-2137. [19] ZHANG H,XU T,ELHOSEINY M,et al.SPDA-CNN:Unifying Semantic Part Detection and Abstraction for Fine-grained Recognition[C]∥Computer Vision and Pattern Recognition.IEEE,2016:1143-1152. [20] MIKOLOV T,SUTSKEVER I,CHEN K,et al.DistributedRepresentations of Words and Phrases and their Compositiona-lity[J].Advances in Neural Information Processing Systems,2013,26:3111-3119. [21] EVERINGHAM M,VAN GOOL L,WILLIAMS C K I,et al.The PASCAL Visual Object Classes[J].International Journal of Computer Vision,2010,88(2):303-338. [22] ANGELOVA A,KRIZHEVSKY A,VANHOUCKE V,et al.Real-Time Pedestrian Detection with Deep Network Cascades[C]∥British Machine Vision Conference.2015:1-12. [23] EVERINGHAM M.The Pascal Visual Object Classes (VOC) Challenge[J].International Journal of Computer Vision,2010,88(2):303-338. [24] TIAN Y,LUO P,WANG X,et al.Deep Learning Strong Parts for Pedestrian Detection[C]∥The IEEE International Confe-rence on Computer Vision(ICCV).2015:1904-1912. [25] KARPATHY A,FEI-FEI L.Deep visual-semantic alignmentsfor generating image descriptions[C]∥The IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2015:3128-3137. [26] CHEN X,ZITNICK C L.Mind’s Eye:A Recurrent Visual Representation for Image Caption Generation[C]∥The IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2015:2422-2431. [27] MAO J,XU W,YANG Y,et al.Explain Images with Multimodal Recurrent Neural Networks[J].arXiv Preprint arXiv:1410.1090,2014 [28] OUYANG W,WANG X,ZENG X,et al.DeepID-Net:Deformable deep convolutional neural networks for object detection[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2015,46(5):2403-2412. [29] KANG K,WANG X.Fully Convolutional Neural Networks for Crowd Segmentation[J].Computer Science,2014,49(1):25-30. [30] LI B,WU T,ZHU S C.Integrating Context and Occlusion forCar Detection by Hierarchical And-Or Model[C]∥European Conference on Computer Vision(ECCV).2014:652-667. [31] KULKARNI G,PREMRAJ V,ORDONEZ V,et al.BabyTalk:Understanding and Generating Simple Image Descriptions[C]∥IEEE Conference on Computer Vision & Pattern Recognition.2013:1601-1608. [32] LIANG X,HU Z,ZHANG H,et al.Recurrent Topic-Transition GAN for Visual Paragraph Generation[J].arXiv Preprint ar-Xiv:1703.07022,7. [33] VENUGOPALAN S,XU H,DONAHUE J,et al.TranslatingVideos to Natural Language Using Deep Recurrent Neural Networks[J].arXiv Preprint arXiv:1412.4729,2014. [34] DEVLIN J,ZBIB R,HUANG Z,et al.Fast and Robust Neural Network Joint Models for Statistical Machine Translation[C]∥Meeting of the Association for Computational Linguistics.2014,6(8):1370-1380. [35] Microsoft.Mscoco[DB/OL].http://mscoco.org. [36] DONAHUE J,HENDRICKS L A,GUADARRAMA S,et al.Long-term recurrent convolutional networks for visual recognition and description[C]∥The IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2015:2625-2634. [37] IOFFE S,SZEGEDY C.Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shift[C]∥Proceedings of the 32nd International Conference on Machine Learning(PMLR).2015:448-456. [38] GRAVES A.Generating Sequences With Recurrent Neural Networks[J].arXiv Preprint arXiv:1308.0850,2013. [39] WANG J,YANG Y,MAO J,et al.CNN-RNN:A UnifiedFramework for Multi-label Image Classification[C]∥The IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2016:2285-2294. [40] LIPTON Z C,BERKOWITZ J,ELKAN C.A Critical Review of Recurrent Neural Networks for Sequence Learning[J].Computer Science,arXiv Preprint arXiv:1506.00019,5. [41] FANG H,PLATT J C,ZITNICK C L,et al.From captions to visual concepts and back[J].Computer Science,2014,2(7):1473-1482. [42] BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[J].Computer Science,arXiv Preprint arXiv:1409.0473,2014. [43] KANG K,OUYANG W L,LI H S,et al.Object Detection from Video Tubelets with Convolutional Neural Networks[C]∥Computer Vision and Pattern Recognition.IEEE,2016:817-825. [44] TIRUMALA S S,NARAYANAN A.Hierarchical data classification using Deep Neural Networks[J].International Confe-rence on Neural Information Processing,2015,0(6):492-500. [45] RASHTCHIAN C,YOUNG P,HODOSH M,et al.Collecting image annotations using Amazon’s Mechanical Turk[C]∥NAACL Hlt 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk.2010:139-147. |
No related articles found! |
|