计算机科学 ›› 2018, Vol. 45 ›› Issue (8): 17-21.doi: 10.11896/j.issn.1002-137X.2018.08.004

• 2017 中国多媒体大会 • 上一篇    下一篇

可伸缩模块化CNN人群计数方法

李云波1, 唐斯琪1, 周星宇2, 潘志松1   

  1. 中国人民解放军理工大学指挥信息系统学院 南京2100001
    中国人民解放军理工大学通信工程学院 南京2100002
  • 收稿日期:2017-10-24 出版日期:2018-08-29 发布日期:2018-08-29
  • 作者简介:李云波(1994-),男,硕士,主要研究方向为机器学习、深度学习,E-mail:18252059269@163.com; 唐斯琪(1993-),女,硕士,主要研究方向为深度学习、图像视觉; 周星宇(1985-),男,硕士,讲师,主要研究方向为机器学习、模式识别; 潘志松(1973-),男,博士,教授,主要研究方向为模式识别与人工智能,E-mail:panzs@nuaa.edu.cn(通信作者)。
  • 基金资助:
    本文受属性学习及其应用研究(61473149),2017年国家重点研发计划“网络空间安全”重点专项(2017YFB0802800)资助。

Crowd Counting Method via Scalable Modularized Convolutional Neural Network

LI Yun-bo1, TANG Si-qi1, ZHOU Xing-yu2, PAN Zhi-song1   

  1. Institute of Command Information System,PLA University of Science and Technology,Nanjing 210000,China1
    College of Communication Engineering,PLA University of Science and Technology,Nanjing 210000,China2
  • Received:2017-10-24 Online:2018-08-29 Published:2018-08-29

摘要: 本文目标是根据任意视角、任意人群密度的图像信息,估计真实场景中的人群密度。但三维空间景物投影到二维空间时会造成透视失真和人群遮挡问题,导致难以区分个体与个体、个体与背景的差异。为此,提出一种灵活高效的可伸缩模块化卷积神经网络(CNN)的架构,允许直接输入任意大小和分辨率的图像,不额外计算视角变化信息,通过生成密度图的方式来估计人群数量。架构的每个模块采用不同卷积核的多列结构,可以拟合不同远近的个体信息;并结合前后两层的特征信息,减少了梯度消失造成的精度下降损失。实验证明,在ShanghaiTech PartA和PartB数据集上,所提方法的准确率比之前最好的MCNN方法分别提高了14.58%,40.53%,均方根误差分别降低了23.89%,33.90%。

关键词: 人群计数, 卷积神经网络, 可伸缩模块, 密度图, 特征融合

Abstract: The purpose of this paper is to accurately estimate the crowd density in real scenes based on image information from arbitrary perspective and arbitrary crowd density.However,crowd counting on static images is a challenging problem.Due to the perspective distortion and the crowd crushes caused by the projection from 3D space into 2D space,it is difficult to distinguish the difference between individual and individual and the difference between individual and background.To this end,this paper proposed a flexible and efficient scalable modularized convolutional neural network (CNN) architecture.The network allows to directly input images with arbitrary size and resolution and it does not require additional computational changes in view information.Each module of the architecture employs a multiple column structure with different convolution kernels,which can be used to fit individual information of different distances.The proposed module also combines the feature information of the front and rear two layers,reducing the decrease loss of the accuracy caused by the vanishing of the gradient.Experiments show thatthe accuracy of proposed method is increased by 14.58% and 40.53%,and the root mean square error is reduced by 23.89% and 33.90% respectively on ShanghaiTech PartA and PartB datasets compared with the state-of-the-art MCNN methods.

Key words: Crowd counting, Convolutional neural network, Scalable module, Density maps, Feature fusion

中图分类号: 

  • TP391
[1]LIN S F,CHEN J Y,CHAO H X.Estimation of number of people in crowded scenes using perspective transformation[J].IEEE Transactions on Systems,Man & Cybernetics Part A Systems & Humans,2001,31(6):645-654.
[2]DALAL N,TRIGGS B.Histograms of Oriented Gradients for Human Detection[C]∥IEEE Computer Society Conference on Computer Vision & Pattern Recognition.IEEE Computer Society,2005:886-893.
[3]WANG M,WANG X.Automatic adaptation of a generic pedestrian detector to a specific traffic scene[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2011:3401-3408.
[4]GE W,COLLINS R T.Marked point processes for crowd-coun-ting[C]∥IEEE Conference on Computer Vision and Pattern Recognition,2009(CVPR 2009).IEEE,2009:2913-2920.
[5]IDREES H,SOOMRO K,SHAH M.Detecting Humans inDense Crowds Using Locally-Consistent Scale Prior and Global Occlusion Reasoning[M].IEEE Computer Society,2015.
[6]LIN Z,DAVIS L S.Shape-Based Human Detection and Segmentation via Hierarchical Part-Template Matching[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2010,32(4):604-618.
[7]LEMPITSKY V S,ZISSERMAN A.Learning To Count Objects in Images[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.,2010:1324-1332.
[8]ZHANG C,LI H,WANG X,et al.Cross-scene crowd counting via deep convolutional neural networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2015:833-841.
[9]WANG C,ZHANG H,YANG L,et al.Deep People Counting in Extremely Dense Crowds[C]∥ACM International Conference on Multimedia.ACM,2015:1299-1302.
[10]BOOMINATHAN L,KRUTHIVENTI S S S,BABU R V.CrowdNet:A Deep Convolutional Network for Dense Crowd Counting[C]∥Proceedings of ACM Conference on Multimedia (ACMMM) - 2016.2016:640-644.
[11]ZHANG Y,ZHOU D,CHEN S,et al.Single-Image CrowdCounting via Multi-Column Convolutional Neural Network[C]∥Computer Vision and Pattern Recognition.IEEE,2016:589-597.
[12]HAN S,POOL J,TRAN J,et al.Learning both Weights and Connections for Efficient Neural Networks[C]∥NIPS 2015.2015:1135-1143.
[13]HAN S,LIU X,MAO H,et al.EIE:Efficient Inference Engine on Compressed Deep Neural Network[C]∥ACM/IEEE International Symposium on Computer Architecture.IEEE,2016:243-254.
[14]HAN S,MAO H,DALLY W J.Deep Compression:Compressing Deep Neural Networks with Pruning,Trained Quantization and Huffman Coding[J].Fiber,2015,56(4):3-7.
[15]LIN M,CHEN Q,YAN S.Network In Network[C]∥International Conference on Learning Representations.2013.
[16]NAIR V,HINTON G E.Rectified linear units improve restric-ted boltzmann machines[C]∥International Conference on International Conference on Machine Learning.Omnipress,2010:807-814.
[17]HE K,ZHANG X,REN S,et al.Deep Residual Learning for Ima-ge Recognition[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2016:770-778.
[18]RODRIGUEZ M,LAPTEV I,SIVIC J,et al.Density-aware person detection and tracking in crowds[C]∥International Confe-rence on Computer Vision.IEEE Computer Society,2011:2423-2430.
[19]IDREES H,SALEEMI I,SEIBERT C,et al.Multi-source Multi-scale Counting in Extremely Dense Crowd Images[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2013:2547-2554.
[20]OÑORO-RUBIO D,LÓPEZ-SASTRE R J.Towards Perspec-tive-Free Object Counting with Deep Learning[C]∥European Conference on Computer Vision.Springer,Cham,2016:615-629.
[1] 黄颖琦, 陈红梅. 基于代价敏感卷积神经网络的非平衡问题混合方法[J]. 计算机科学, 2021, 48(9): 77-85.
[2] 徐涛, 田崇阳, 刘才华. 基于深度学习的人群异常行为检测综述[J]. 计算机科学, 2021, 48(9): 125-134.
[3] 赫晓慧, 邱芳冰, 程淅杰, 田智慧, 周广胜. 基于边缘特征融合的高分影像建筑物目标检测[J]. 计算机科学, 2021, 48(9): 140-145.
[4] 张新峰, 宋博. 一种基于改进三元组损失和特征融合的行人重识别方法[J]. 计算机科学, 2021, 48(9): 146-152.
[5] 王乐, 杨晓敏. 基于感知损失的遥感图像全色锐化反馈网络[J]. 计算机科学, 2021, 48(8): 91-98.
[6] 叶中玉, 吴梦麟. 融合时序监督和注意力机制的脉络膜新生血管分割[J]. 计算机科学, 2021, 48(8): 118-124.
[7] 王施云, 杨帆. 基于U-Net特征融合优化策略的遥感影像语义分割方法[J]. 计算机科学, 2021, 48(8): 162-168.
[8] 王炽, 常俊. 基于3D卷积神经网络的CSI跨场景手势识别方法[J]. 计算机科学, 2021, 48(8): 322-327.
[9] 程松盛, 潘金山. 基于深度学习特征匹配的视频超分辨率方法[J]. 计算机科学, 2021, 48(7): 184-189.
[10] 王栋, 周大可, 黄有达, 杨欣. 基于多尺度多粒度特征的行人重识别[J]. 计算机科学, 2021, 48(7): 238-244.
[11] 熊朝阳, 王婷. 基于卷积神经网络的建筑构件图像识别[J]. 计算机科学, 2021, 48(6A): 51-56.
[12] 胡京徽, 许鹏. 一种基于图像分类的航空紧固件产品自动分类方法[J]. 计算机科学, 2021, 48(6A): 63-66.
[13] 和青芳, 王慧, 程光. 自适应小数据集乳腺癌病理组织分类研究[J]. 计算机科学, 2021, 48(6A): 67-73.
[14] 张曼, 李杰, 朱新忠, 沈霁, 成昊天. 基于改进DCGAN算法的遥感数据集增广方法[J]. 计算机科学, 2021, 48(6A): 80-84.
[15] 徐少伟, 秦品乐, 曾建朝, 赵致楷, 高媛, 王丽芳. 基于多级特征和全局上下文的纵膈淋巴结分割算法[J]. 计算机科学, 2021, 48(6A): 95-100.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 编辑部. 新网站开通,欢迎大家订阅![J]. 计算机科学, 2018, 1(1): 1 .
[2] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[3] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[4] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[5] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[6] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[7] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[8] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[9] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[10] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .