计算机科学 ›› 2018, Vol. 45 ›› Issue (8): 17-21.doi: 10.11896/j.issn.1002-137X.2018.08.004

• 2017 中国多媒体大会 • 上一篇    下一篇

可伸缩模块化CNN人群计数方法

李云波1, 唐斯琪1, 周星宇2, 潘志松1   

  1. 中国人民解放军理工大学指挥信息系统学院 南京2100001
    中国人民解放军理工大学通信工程学院 南京2100002
  • 收稿日期:2017-10-24 出版日期:2018-08-29 发布日期:2018-08-29
  • 作者简介:李云波(1994-),男,硕士,主要研究方向为机器学习、深度学习,E-mail:18252059269@163.com; 唐斯琪(1993-),女,硕士,主要研究方向为深度学习、图像视觉; 周星宇(1985-),男,硕士,讲师,主要研究方向为机器学习、模式识别; 潘志松(1973-),男,博士,教授,主要研究方向为模式识别与人工智能,E-mail:panzs@nuaa.edu.cn(通信作者)。
  • 基金资助:
    本文受属性学习及其应用研究(61473149),2017年国家重点研发计划“网络空间安全”重点专项(2017YFB0802800)资助。

Crowd Counting Method via Scalable Modularized Convolutional Neural Network

LI Yun-bo1, TANG Si-qi1, ZHOU Xing-yu2, PAN Zhi-song1   

  1. Institute of Command Information System,PLA University of Science and Technology,Nanjing 210000,China1
    College of Communication Engineering,PLA University of Science and Technology,Nanjing 210000,China2
  • Received:2017-10-24 Online:2018-08-29 Published:2018-08-29

摘要: 本文目标是根据任意视角、任意人群密度的图像信息,估计真实场景中的人群密度。但三维空间景物投影到二维空间时会造成透视失真和人群遮挡问题,导致难以区分个体与个体、个体与背景的差异。为此,提出一种灵活高效的可伸缩模块化卷积神经网络(CNN)的架构,允许直接输入任意大小和分辨率的图像,不额外计算视角变化信息,通过生成密度图的方式来估计人群数量。架构的每个模块采用不同卷积核的多列结构,可以拟合不同远近的个体信息;并结合前后两层的特征信息,减少了梯度消失造成的精度下降损失。实验证明,在ShanghaiTech PartA和PartB数据集上,所提方法的准确率比之前最好的MCNN方法分别提高了14.58%,40.53%,均方根误差分别降低了23.89%,33.90%。

关键词: 卷积神经网络, 可伸缩模块, 密度图, 人群计数, 特征融合

Abstract: The purpose of this paper is to accurately estimate the crowd density in real scenes based on image information from arbitrary perspective and arbitrary crowd density.However,crowd counting on static images is a challenging problem.Due to the perspective distortion and the crowd crushes caused by the projection from 3D space into 2D space,it is difficult to distinguish the difference between individual and individual and the difference between individual and background.To this end,this paper proposed a flexible and efficient scalable modularized convolutional neural network (CNN) architecture.The network allows to directly input images with arbitrary size and resolution and it does not require additional computational changes in view information.Each module of the architecture employs a multiple column structure with different convolution kernels,which can be used to fit individual information of different distances.The proposed module also combines the feature information of the front and rear two layers,reducing the decrease loss of the accuracy caused by the vanishing of the gradient.Experiments show thatthe accuracy of proposed method is increased by 14.58% and 40.53%,and the root mean square error is reduced by 23.89% and 33.90% respectively on ShanghaiTech PartA and PartB datasets compared with the state-of-the-art MCNN methods.

Key words: Convolutional neural network, Crowd counting, Density maps, Feature fusion, Scalable module

中图分类号: 

  • TP391
[1]LIN S F,CHEN J Y,CHAO H X.Estimation of number of people in crowded scenes using perspective transformation[J].IEEE Transactions on Systems,Man & Cybernetics Part A Systems & Humans,2001,31(6):645-654.
[2]DALAL N,TRIGGS B.Histograms of Oriented Gradients for Human Detection[C]∥IEEE Computer Society Conference on Computer Vision & Pattern Recognition.IEEE Computer Society,2005:886-893.
[3]WANG M,WANG X.Automatic adaptation of a generic pedestrian detector to a specific traffic scene[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2011:3401-3408.
[4]GE W,COLLINS R T.Marked point processes for crowd-coun-ting[C]∥IEEE Conference on Computer Vision and Pattern Recognition,2009(CVPR 2009).IEEE,2009:2913-2920.
[5]IDREES H,SOOMRO K,SHAH M.Detecting Humans inDense Crowds Using Locally-Consistent Scale Prior and Global Occlusion Reasoning[M].IEEE Computer Society,2015.
[6]LIN Z,DAVIS L S.Shape-Based Human Detection and Segmentation via Hierarchical Part-Template Matching[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2010,32(4):604-618.
[7]LEMPITSKY V S,ZISSERMAN A.Learning To Count Objects in Images[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.,2010:1324-1332.
[8]ZHANG C,LI H,WANG X,et al.Cross-scene crowd counting via deep convolutional neural networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2015:833-841.
[9]WANG C,ZHANG H,YANG L,et al.Deep People Counting in Extremely Dense Crowds[C]∥ACM International Conference on Multimedia.ACM,2015:1299-1302.
[10]BOOMINATHAN L,KRUTHIVENTI S S S,BABU R V.CrowdNet:A Deep Convolutional Network for Dense Crowd Counting[C]∥Proceedings of ACM Conference on Multimedia (ACMMM) - 2016.2016:640-644.
[11]ZHANG Y,ZHOU D,CHEN S,et al.Single-Image CrowdCounting via Multi-Column Convolutional Neural Network[C]∥Computer Vision and Pattern Recognition.IEEE,2016:589-597.
[12]HAN S,POOL J,TRAN J,et al.Learning both Weights and Connections for Efficient Neural Networks[C]∥NIPS 2015.2015:1135-1143.
[13]HAN S,LIU X,MAO H,et al.EIE:Efficient Inference Engine on Compressed Deep Neural Network[C]∥ACM/IEEE International Symposium on Computer Architecture.IEEE,2016:243-254.
[14]HAN S,MAO H,DALLY W J.Deep Compression:Compressing Deep Neural Networks with Pruning,Trained Quantization and Huffman Coding[J].Fiber,2015,56(4):3-7.
[15]LIN M,CHEN Q,YAN S.Network In Network[C]∥International Conference on Learning Representations.2013.
[16]NAIR V,HINTON G E.Rectified linear units improve restric-ted boltzmann machines[C]∥International Conference on International Conference on Machine Learning.Omnipress,2010:807-814.
[17]HE K,ZHANG X,REN S,et al.Deep Residual Learning for Ima-ge Recognition[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2016:770-778.
[18]RODRIGUEZ M,LAPTEV I,SIVIC J,et al.Density-aware person detection and tracking in crowds[C]∥International Confe-rence on Computer Vision.IEEE Computer Society,2011:2423-2430.
[19]IDREES H,SALEEMI I,SEIBERT C,et al.Multi-source Multi-scale Counting in Extremely Dense Crowd Images[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2013:2547-2554.
[20]OÑORO-RUBIO D,LÓPEZ-SASTRE R J.Towards Perspec-tive-Free Object Counting with Deep Learning[C]∥European Conference on Computer Vision.Springer,Cham,2016:615-629.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[7] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[8] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[9] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[10] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[11] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[12] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[13] 吴子斌, 闫巧.
基于动量的映射式梯度下降算法
Projected Gradient Descent Algorithm with Momentum
计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039
[14] 张嘉淏, 刘峰, 齐佳音.
一种基于Bottleneck Transformer的轻量级微表情识别架构
Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer
计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[15] 王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤.
不同数据增强方法对模型识别精度的影响
Influence of Different Data Augmentation Methods on Model Recognition Accuracy
计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!