Computer Science ›› 2020, Vol. 47 ›› Issue (8): 233-240.doi: 10.11896/jsjkx.190600109

Previous Articles     Next Articles

Image Caption of Safety Helmets Wearing in Construction Scene Based on YOLOv3

XU Shou-kun, NI Chu-han, JI Chen-chen, LI Ning   

  1. School of Information Science and Engineering, Changzhou University, Changzhou, Jiangsu 213164, China
  • Revised:2019-06-19 Online:2020-08-15 Published:2020-08-10
  • About author:U Shou-kun, born in 1972, Ph.D, professor, is a member of China Computer Federation.His main research interests include artificial intelligence and pervasive computing.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China(61803050).

Abstract: In recent years, construction accidents caused by workers not wearing safety helmets occur frequently.In order to reduce the accident rate, this paper studies the image caption of workers wearing safety helmets.Existing image caption methods based on neural network are short of explanatory and detailed descriptions, and the research on the image caption of construction scene is relatively scarce.To solve this problem, this paper proposes YOLOv3 algorithm with the method based on the combination of semantic rules and sentence templates to gradually generate the description of wearing safety helmets.Firstly, images are collected, and the safety helmets wearing detection dataset and the image caption dataset are made.Secondly, the K-means algorithm is used to determine anchor boxes parameters, which are applicable to the dataset for the training and detection of YOLOv3 network.Thirdly, a semantic rule is predefined to combine the target detection results to extract visual concepts.Finally, the extracted visual concepts are filled into the sentence template generated by the image caption annotation, which is used to generate the image description statement about the workers wearing safety helmets or not in the construction scene.The Ubuntu16.04 system and the Keras deep learning framework are used to build the experimental environment, and different algorithmsare compared on the self-made datasets of helmet wearing.The experimental results show that the proposed method can not only effectively define the number of safety helmet wearers and non-wearers, but also achieve 0.722 and 0.957 respectively on BLEU-1 and CIDEr evaluation metrics, which are 6.9% and 14.8% higher than other methods, demonstrating the effectiveness and superiority of the proposed method.

Key words: Image caption method, K-means clustering algorithm, Safety helmet wearing, Semantic rules, Sentence template, YOLOv3 network

CLC Number: 

  • TP391
[1]FARHADI A, HEJRATI M, SADEGHI M A, et al.Every Picture Tells a Story:Generating Sentences from Images[C]∥European Conference on Computer Vision, 2010:15-29.
[2]KULKARNI G, PREMRAJ V, DHAR S, et al.Baby talk:Understanding and generating simple image descriptions[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society, 2011:1601-1608.
[3]ORDONEZ V, KULKARNI G, BERG T L.Im2Text:describing images using 1 million captioned photographs[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.2011:1143-1151.
[4]JIA Y, SALZMANN M, DARRELL T.Learning cross-modality similarity for multinomial data[C]∥International Conference on Computer Vision.IEEE, 2011:2407-2414.
[5]VINYALS O, TOSHEV A, BENGIO S, et al.Show and tell:A neural image caption generator[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society, 2015:3156-3164.
[6]XU K, BA J, KIROS R, et al.Show, Attend and Tell:Neural Ima-ge Caption Generation with Visual Attention[C]∥Internatio-nal Conference on Machine Learning (ICML).2015:2048-2057.
[7]WU Q, SHEN C, LIU L, et al.What Value Do Explicit High Level Concepts Have in Vision to Language Problems?[C]∥Computer Vision and Pattern Recognition.IEEE, 2016:203-212.
[8]LU J, XIONG C, PARIKH D, et al.Knowing When to Look:Adaptive Attention via a Visual Sentinel for Image Captioning[C]∥IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE Computer Society, 2017:3242-3250.
[9]FENG G C, CHEN Y Y, CHEN N, et al.Research on automatic helmet recognition based on machine vision [J].Mechanical Design and Manufacturing Engineering, 2015, 44(10):39-42.
[10]DAHIYA K, SINGH D, MOHAN C K.Automatic detection of bike-riders without helmet using surveillance videos in real-time[C]∥International Joint Conference on Neural Networks.IEEE, 2016:3046-3051.
[11]GIRSHICK R, DONAHUE J, DARRELLAND T, et al.Richfeature hierarchies for object detection and semantic segmentation[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE, 2014:580-587.
[12]GIRSHICK R.Fast R-CNN[C]∥International Conference on Computer Vision.IEEE, 2015:1440-1448.

[13]REN S, HE K, GIRSHICK R, et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[14]REDMON J, FARHADI A.YOLOv3:An Incremental Improvement[C]∥IEEE Conference on Computer Vision and Pattern Recognition.2018.
[15]LIU W, ANGUELOV D, ERHAN D, et al.SSD:Single ShotMultiBox Detector[C]∥European Conference on Computer Vision.2016:21-37.
[16]REDMON J, FARHADI A.YOLO9000:Better, Faster, Stronger[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE, 2017:6517-6525.
[17]HE K, ZHANG X, REN S, et al.Deep Residual Learning for Ima-ge Recognition[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society, 2016:770-778.
[18]LIN T Y, MAIRE M, BELONGIE S, et al.Microsoft COCO:Common Objects in Context[C]∥European Conference on Computer Vision.2014:740-755.
[19]HODOSH M, YOUNG P, HOCKENMAIER J.Framing Image Description as a Ranking Task:Data, Models and Evaluation Metrics[C]∥International Conference on Artificial Intelligence.AAAI Press, 2015:4188-4192.
[20]PLUMMER B A, WANG L, CERVANTES C M, et al.Flickr30k Entities:Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models[J].International Journal of Computer Vision, 2017, 123(1):74-93.
[21]PAPINENI K, ROUKOS S, WARD T, et al.BLEU:a method for automatic evaluation of machine translation[C]∥Procee-dings of the 40th Annual Meeting on Association for Computational Linguistics.Philadelphia, Pennsylvania:Association for Computational Linguistics, 2002:311-318.
[22]BANERJEE S, LAVIE A.METEOR:an automatic metric for MT evaluation with improved correlation with human judgments[C]∥Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.Ann Arbor:ACL, 2005:65-72.
[23]VEDANTAM R, ZITNICK C L, PARIKH D.CIDEr:Consensus-based Image Description Evaluation[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE, 2015:4566-4575.
[1] JIN Yu-fang, WU Xiang, DONG Hui, YU Li, ZHANG Wen-an. Improved YOLO v4 Algorithm for Safety Helmet Wearing Detection [J]. Computer Science, 2021, 48(11): 268-275.
[2] GAO Man, HAN Yong, CHEN Ge, ZHANG Xiao-lei and LI Jie. Computational Model of Average Travel Speed Based on K-means Algorithms [J]. Computer Science, 2016, 43(Z6): 422-424.
[3] ZHANG Qun, WANG Hong-jun and WANG Lun-wen. Short Text Clustering Algorithm Combined with Context Semantic Information [J]. Computer Science, 2016, 43(Z11): 443-446.
[4] LV Ming-lei,LIU Dong-mei and ZENG Zhi-yong. Novel Image Retrieval Method of Improved K-means Clustering Algorithm [J]. Computer Science, 2013, 40(8): 285-288.
Full text



No Suggested Reading articles found!