计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 233-240.doi: 10.11896/jsjkx.190600109
徐守坤, 倪楚涵, 吉晨晨, 李宁
XU Shou-kun, NI Chu-han, JI Chen-chen, LI Ning
摘要: 近年来, 因工人未佩戴安全帽而造成的施工事故频繁发生, 为降低事故发生率, 对工人安全帽佩戴情况进行图像描述的研究。当前基于神经网络的图像描述方法缺乏可解释性且细节描述不充分, 施工场景图像描述的研究较为匮乏, 针对该问题, 提出采用YOLOv3(You Only Look Once)的检测算法, 以及基于语义规则和语句模板相结合的方法递进式地生成安全帽佩戴的描述语句。首先, 采集数据, 制作安全帽佩戴检测数据集和图像字幕数据集;其次, 使用K-means算法确定适用于该数据集的锚框参数值, 用以YOLOv3网络的训练与检测;再次, 预定义一个语义规则, 结合目标检测结果来提取视觉概念;最后, 将提取出的视觉概念填充进由图像字幕标注生成的语句模板, 以生成关于施工场景中工人安全帽佩戴的图像描述语句。使用Ubuntu16.04系统和Keras深度学习框架搭建实验环境, 在自制的安全帽佩戴数据集上进行不同算法的对比实验。实验结果表明, 所提方法不仅能够有效界定安全帽佩戴者和未佩戴者的数量, 而且在BLEU-1和CIDEr评价指标上的得分分别达到了0.722和0.957, 相比其他方法分别提高了6.9%和14.8%, 证明了该方法的有效性和优越性。
中图分类号:
[1]FARHADI A, HEJRATI M, SADEGHI M A, et al.Every Picture Tells a Story:Generating Sentences from Images[C]∥European Conference on Computer Vision, 2010:15-29. [2]KULKARNI G, PREMRAJ V, DHAR S, et al.Baby talk:Understanding and generating simple image descriptions[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society, 2011:1601-1608. [3]ORDONEZ V, KULKARNI G, BERG T L.Im2Text:describing images using 1 million captioned photographs[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.2011:1143-1151. [4]JIA Y, SALZMANN M, DARRELL T.Learning cross-modality similarity for multinomial data[C]∥International Conference on Computer Vision.IEEE, 2011:2407-2414. [5]VINYALS O, TOSHEV A, BENGIO S, et al.Show and tell:A neural image caption generator[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society, 2015:3156-3164. [6]XU K, BA J, KIROS R, et al.Show, Attend and Tell:Neural Ima-ge Caption Generation with Visual Attention[C]∥Internatio-nal Conference on Machine Learning (ICML).2015:2048-2057. [7]WU Q, SHEN C, LIU L, et al.What Value Do Explicit High Level Concepts Have in Vision to Language Problems?[C]∥Computer Vision and Pattern Recognition.IEEE, 2016:203-212. [8]LU J, XIONG C, PARIKH D, et al.Knowing When to Look:Adaptive Attention via a Visual Sentinel for Image Captioning[C]∥IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE Computer Society, 2017:3242-3250. [9]FENG G C, CHEN Y Y, CHEN N, et al.Research on automatic helmet recognition based on machine vision [J].Mechanical Design and Manufacturing Engineering, 2015, 44(10):39-42. [10]DAHIYA K, SINGH D, MOHAN C K.Automatic detection of bike-riders without helmet using surveillance videos in real-time[C]∥International Joint Conference on Neural Networks.IEEE, 2016:3046-3051. [11]GIRSHICK R, DONAHUE J, DARRELLAND T, et al.Richfeature hierarchies for object detection and semantic segmentation[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE, 2014:580-587. [12]GIRSHICK R.Fast R-CNN[C]∥International Conference on Computer Vision.IEEE, 2015:1440-1448. [13]REN S, HE K, GIRSHICK R, et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149. [14]REDMON J, FARHADI A.YOLOv3:An Incremental Improvement[C]∥IEEE Conference on Computer Vision and Pattern Recognition.2018. [15]LIU W, ANGUELOV D, ERHAN D, et al.SSD:Single ShotMultiBox Detector[C]∥European Conference on Computer Vision.2016:21-37. [16]REDMON J, FARHADI A.YOLO9000:Better, Faster, Stronger[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE, 2017:6517-6525. [17]HE K, ZHANG X, REN S, et al.Deep Residual Learning for Ima-ge Recognition[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society, 2016:770-778. [18]LIN T Y, MAIRE M, BELONGIE S, et al.Microsoft COCO:Common Objects in Context[C]∥European Conference on Computer Vision.2014:740-755. [19]HODOSH M, YOUNG P, HOCKENMAIER J.Framing Image Description as a Ranking Task:Data, Models and Evaluation Metrics[C]∥International Conference on Artificial Intelligence.AAAI Press, 2015:4188-4192. [20]PLUMMER B A, WANG L, CERVANTES C M, et al.Flickr30k Entities:Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models[J].International Journal of Computer Vision, 2017, 123(1):74-93. [21]PAPINENI K, ROUKOS S, WARD T, et al.BLEU:a method for automatic evaluation of machine translation[C]∥Procee-dings of the 40th Annual Meeting on Association for Computational Linguistics.Philadelphia, Pennsylvania:Association for Computational Linguistics, 2002:311-318. [22]BANERJEE S, LAVIE A.METEOR:an automatic metric for MT evaluation with improved correlation with human judgments[C]∥Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.Ann Arbor:ACL, 2005:65-72. [23]VEDANTAM R, ZITNICK C L, PARIKH D.CIDEr:Consensus-based Image Description Evaluation[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE, 2015:4566-4575. |
[1] | 金雨芳, 吴祥, 董辉, 俞立, 张文安. 基于改进YOLO v4的安全帽佩戴检测算法 Improved YOLO v4 Algorithm for Safety Helmet Wearing Detection 计算机科学, 2021, 48(11): 268-275. https://doi.org/10.11896/jsjkx.200900098 |
[2] | 吕明磊,刘冬梅,曾智勇. 一种改进的K-means聚类算法的图像检索方法 Novel Image Retrieval Method of Improved K-means Clustering Algorithm 计算机科学, 2013, 40(8): 285-288. |
|