计算机科学 ›› 2020, Vol. 47 ›› Issue (4): 150-156.doi: 10.11896/jsjkx.190400034

• 计算机图形学&多媒体 • 上一篇    下一篇

基于单列多尺度卷积神经网络的人群计数

彭贤彭, 玉旭, 汤强, 宋砚琪   

  1. 长沙理工大学计算机与通信工程学院 长沙410000
  • 收稿日期:2019-04-05 出版日期:2020-04-15 发布日期:2020-04-15
  • 通讯作者: 彭玉旭(373836911@qq.com)
  • 基金资助:
    湖南省教育厅优秀青年项目(18B162);长沙理工大学青年教师成长计划项目(2019QJCZ014)

Crowd Counting Based on Single-column Multi-scale Convolutional Neural Network

PENG Xian, PENG Yu-xu, TANG Qiang, SONG Yan-qi   

  1. School of Computer and Communication Engineering,Changsha University of Science Technology,Changsha 410000,China
  • Received:2019-04-05 Online:2020-04-15 Published:2020-04-15
  • Contact: PENG Yu-xu,born in 1977,Ph.D,associate professor,CCF member,mainly focuses on signal and information processing.
  • About author:PENG Xian,born in 1994,master.His main research area is deep learning.
  • Supported by:
    This work was supported by the Research Foundation of Education Bureau of Hunan Province,China(18B162) and Young Teacher Development Foundation of Changsha University of Science & Technology(2019QJCZ014).

摘要: 单张图片和监控视频中的人群计数问题在近年来受到了越来越多的关注。尺度的变化和人群遮挡等问题,导致人群计数是一项十分具有挑战性的任务,但是深度卷积神经网络被证明能有效地解决这一问题。文中提出了一种单列多尺度的卷积神经网络,该网络提供了一种数据驱动的深度学习方法,能够理解各种不同的场景,并能进行精确的计数估计。该网络模型主要由作为二维特征提取的前端与中端,和用来还原密度图的后端组成。其中,使用堆叠池代替最大池化层,在不引入额外参数的前提下增加了模型的尺度不变性。网络模型前端采用部分VGG-16结构;中端采用FME(特征聚合模块),用来打破不同列之间的独立,以更好地提取多尺度特征信息;后端采用3列5层的不同扩张率的空洞卷积,在保持分辨率不变的情况下增加感受野,生成更高质量的人群密度图,并引入一种相对人数损失,以提升稀疏密度人群情况下模型的性能。该模型在两个最具挑战性的人群计数数据集上都取得了很好的效果。实验结果表明,在公开人群计数数据集ShanghaiTech的两个子集和UCF_CC_50上,该方法的平均绝对误差(MAE)和均方误差(MSE)分别是 66.2 和 103.0、8.7和 13.4、251.0 和 329.5,性能比传统人群计数方法更好。与其他模型相比,该模型拥有更高的精度和更好的鲁棒性,对稀疏人数图像有着更好的计数效果。

关键词: 堆叠池, 卷积神经网络, 空洞卷积, 人群计数, 特征聚合, 相对人数损失

Abstract: The problem of crowd counting in single images and monitoring videos has received increasing attention in recent years.Due to the scale change and crowd occlusion,crowd counting is a very challenging problem,but deep convolutional neural network has been proved to be effective in solving this problem.In this paper,a single-column multi-scale convolutional neural network is proposed,which provides a data-driven deep learning method that can understand various scenarios and perform accurate counting and estimation.The proposed network model is mainly composed of the front end and the middle end,for two-dimensional features extraction,as well as the back end,which is used to restore the density map.Stack pools are used to replace the maximum pooling layer,and scale invariance of the model is increased without introducing additional parameters.Partial vgg-16 structure is adopted at the front end of the network model,and FME (feature aggregation module) is adopted in the middle to break the independence between different columns,to better extract multi-scale feature information.At the back end,three columns and five layers of cavity convolution with different expansion rates are adopted to increase the sensing field while keeping the resolution unchanged,generating a crowd density map with higher quality.A relative population loss is introduced to improve the model performance in the case of sparse population density.This model works well on two of the most challenging crowd counting data sets.The results show that on two subsets of ShanghaiTech and UCF_CC_50,the mean absolute error (MAE) and mean square error (MSE) of the proposed method are 66.2 and 103.0,8.7 and 13.4,251.0 and 329.5,respectively,achieving better performance than the traditional crowd counting methods.Compared with other models,the proposed model has higher accuracy,better robustness and better counting effect for images with sparse population.

Key words: Convolutional neural networks, Crowd counting, Dilated convolution, Feature combination, Relative head loss, Stacked-pooling

中图分类号: 

  • TP391
[1]QU J,SHI Z L,YE Y D.Unbalanced crowd density estimation based on convolutional features[J].Computer Science,2018,45(8):236-241.
[2]ZHANG Y,ZHOU D,CHEN S,et al.Single-image crowdcounting via multi-column convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:589-597.
[3]WANG C,ZHANG H,YANG L,et al.Deep people counting in extremely dense crowds[C]//Proceedings of the 23rd ACM International Conference on Multimedia.ACM,2015:1299-1302.
[4]LIN S F,CHEN J Y,CHAO H X.Estimation of number of people in crowded scenes using perspective transformation[J].IEEE Transactions on Systems,Man,and Cybernetics-Part A:Systems and Humans,2001,31(6):645-654.
[5]DALAL N,TRIGGS B.Histograms of oriented gradients forhuman detection[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2005(CVPR 2005).IEEE,2005:886-893.
[6]WANG M,WANG X.Automatic adaptation of a generic pedestrian detector to a specific traffic scene[C]//CVPR 2011.IEEE,2011:3401-3408[7]GE W,COLLINS R T.Marked point processes for crowd counting[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:2913-2920.
[8]IDREES H,SOOMRO K,SHAH M.Detecting humans in dense crowds using locally-consistent scale prior and global occlusion reasoning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(10):1986-1998.
[9]LIN Z,DAVIS L S.Shape-based human detection and segmentation via hierarchical part-template matching[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(4):604-618.
[10]CHAN A B,VASCONCELOS N.Bayesian poisson regressionfor crowd counting[C]//2009 IEEE 12th International Conference on Computer Vision.IEEE,2009:545-551.
[11]CHEN K,LOY C C,GONG S,et al.Feature mining for localised crowd counting[C]//BMVC.2012:3.
[12]LEMPITSKY V,ZISSERMAN A.Learning to count objects in images[C]//Advances in Neural Information Processing Systems.2010:1324-1332.
[13]ZHANG Y,ZHOU D,CHEN S,et al.Single-image crowdcounting via multi-column convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:589-597.
[14]ZENG L,XU X,CAI B,et al.Multi-scale convolutional neural networks for crowd counting[C]//2017 IEEE International Conference on Image Processing (ICIP).IEEE,2017:465-469.
[15]LI Y,ZHANG X,CHEN D.Csrnet:Dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1091-1100.
[16]CAO X,WANG Z,ZHAO Y,et al.Scale aggregation network for accurate and efficient crowd counting[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:734-750.
[17]HUANG S,LI X,CHENG Z Q,et al.Stacked pooling:Improving crowd counting by boosting scale invariance[J].arXiv:1808.07456,2018[18]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[19]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[20]NAIR V,HINTON G E.Rectified linear units improve restrictedboltzmann machines[C]//Proceedings of the 27th International Conference on Machine Learning (ICML-10).2010:807-814.
[21]YU F,KOLTUN V.Multi-scale context aggregation by dilated convolutions[J].arXiv:1511.07122,2015.
[22]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848.
[23]CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv:1706.05587,2017.
[24]ZEILER M D,KRISHNAN D,TAYLOR G W,et al.Deconvolutional networks[C]//2010 IEEE Computer Society Confe-rence on Computer Vision and Pattern Recognition.IEEE,2010:2528-2535.
[25]NOH H,HONG S,HAN B.Learning deconvolution network for semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1520-1528.
[26]ZHANG L,SHI M,CHEN Q.Crowd counting via scale-adaptive convolutional neural network[C]//2018 IEEE WinterConfe-rence on Applications of Computer Vision (WACV).IEEE,2018:1113-1121.
[27]RODRIGUEZ M,LAPTEV I,SIVIC J,et al.Density-aware person detection and tracking in crowds[C]//2011 International Conference on Computer Vision.IEEE,2011:2423-2430.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[7] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[8] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[9] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[10] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[11] 张嘉淏, 刘峰, 齐佳音.
一种基于Bottleneck Transformer的轻量级微表情识别架构
Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer
计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[12] 王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤.
不同数据增强方法对模型识别精度的影响
Influence of Different Data Augmentation Methods on Model Recognition Accuracy
计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210
[13] 孙洁琪, 李亚峰, 张文博, 刘鹏辉.
基于离散小波变换的双域特征融合深度卷积神经网络
Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation
计算机科学, 2022, 49(6A): 434-440. https://doi.org/10.11896/jsjkx.210900199
[14] 王杉, 徐楚怡, 师春香, 张瑛.
基于CNN-LSTM的卫星云图云分类方法研究
Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM
计算机科学, 2022, 49(6A): 675-679. https://doi.org/10.11896/jsjkx.210300177
[15] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!