计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 183-187.doi: 10.11896/jsjkx.200300012

• 计算机图形学&多媒体 • 上一篇    下一篇

基于多任务学习及由粗到精的卷积神经网络人群计数模型

陈训敏, 叶书函, 詹瑞   

  1. 四川大学电子信息学院 成都 610065
  • 出版日期:2020-11-15 发布日期:2020-11-17
  • 通讯作者: 陈训敏(chenxunmin1995@163.com)

Crowd Counting Model of Convolutional Neural Network Based on Multi-task Learning and Coarse to Fine

CHEN Xun-min, YE Shu-han, ZHAN Rui   

  1. College of Electronics and Information Engineering,Sichuan University,Chengdu 610065,China
  • Online:2020-11-15 Published:2020-11-17
  • About author:CHEN Xun-min,born in 1995,postgra-duate.His main research interests include deep learning and image communication.

摘要: 人群计数是指计算单张图像或单个视频帧中人的数目,为了解决人群任务的计数不够准确的问题,提出了一种基于多任务学习及由粗到精的卷积神经网络人群计数模型。首先,多任务学习是指引入与原始任务相关的辅助任务,指导主要任务的学习,人群密度估计是人群计数模型的主要任务,人群分割任务作为辅助任务以提高网络性能。其次,由粗到精策略表明人群计数模型预测密度图是一个由粗糙到精细的过程,即生成粗糙且不准确的人群密度图,结合人群分割图后得到准确的人群密度图。在Shanghai Tech数据集Part A部分、Part B部分和UCF_CC_50数据集上的实验表明,所提人群计数模型相比之前最好的CSRNet模型绝对误差分别降低了4.55%,14.15%,19.09%,均方误差分别降低了10.00%,19.09%,19.47%,显著提高了人群计数模型的准确性和鲁棒性。

关键词: 多任务学习, 卷积神经网络, 人群分割, 人群计数, 人群密度估计

Abstract: Crowd counting refers to counting the number of people in a single image or a single video frame.In order to solve the problem of insufficient counting of crowd tasks,a crowd counting model based on multi-task learning and coarse to fine convolutional neural network is proposed.Firstly,multi-task learning means introducing auxiliary tasks related to the original task to guide the learning of the main tasks.The crowd density estimation is the main task of the crowd counting model,and the crowd segmentation task is used as an auxiliary task to improve network performance.Secondly,the proposed crowd counting model is able to predict the density map from coarse to fine.A rough and inaccurate crowd density map is generated,which is combined with the crowd segmentation map to obtain an accurate crowd density map.Experiments on the Shanghai Tech dataset Part A and Part B,and UCF_CC_50 dataset show that the proposed crowd counting model outperforms the state of the art CSRNet models by 4.55%,14.15% and 19.09% respectively,and the mean square error is reduced by 10.00 %,19.09% and 19.47% respectively compared with the SOTAs.The proposed model significantly improves the accuracy and robustness of the crowd counting model.

Key words: Convolutional neural network, Crowd counting, Crowd density estimation, Crowd segmentation, Multi-task learning

中图分类号: 

  • TP391
[1] FU H,MA H,XIAO H.Scene adaptive accurate and fast vertical crowd counting via joint using depth and color information[J].Multimedia Tools and Applications,2014,73(1):273-289.
[2] WEI WU,ZHANG Q S,WANG M J,et al.Detection of traffic parameters based on computer vision and image processing[J].Information and Control,2001,30(3):257-261.
[3] FRENCH G,FISHER M,MACKIEWICZ M,et al.Convolutionalneural networks for counting fish in fisheries surveillancevi-deo[C]//British Machine Vision Conference.2015:23-32.
[4] RYAN D,DENMON S,SRIDHARAN S,et al.An evaluation of crowd counting methods,features and regression models[J].Computer Vision and Image Understanding,2015,130:1-17.
[5] VIOLA P,JONES M J.Robust Real time face detection[J].International Journal of Computer Vision,2004,57(2):137-154.
[6] DALAL N,TRIGGS B.Histograms of oriented gradients forhuman detection[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2005:886-893.
[7] HAAR A.Zur Theorie der orthogonalen Funktionen systeme[J].Mathematische Annalen,1911,71(1):38-53.
[8] WU B,NEVATIA R.Detection of multiple,partially occludedhumans in a single image by Bayesian combination of edgelet part detectors[C]//Tenth IEEE International Conference on Computer Vision,2005(ICCV 2005).IEEE,2005:90-97.
[9] HEARTS M A,DUMAIS S T,OSMAN E,et al.Support vector machines[J].IEEE Intelligent Systems,1998,13(4):18-28.
[10] LIN S F,CHEN J Y,CHAO H X.Estimation of number of people in crowded scenes using perspective transformation[J].IEEE Transactions on Systems,Man & Cybernetics Part A (Systems & Humans),2001,31(6):645-654.
[11] VIOLA P,JONES M,SNOW D.Detecting pedestrians usingpatterns of motion and appearance[J].International Journal of Computer Vision,2005,63(2):153-161.
[12] CHAN A B,LIANG Z S J,VASCONCELOS N.Privacy preserving crowd monitoring:counting people without people models or tracking[C]//Proceedings of the2008 IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Socie-ty,2008:1-7.
[13] CHAN A B,VASCONCELOS N.Bayesian poisson regression for crowd counting[C]//2009 IEEE 12th International Conference on Computer Vision.IEEE,2009:545-551.
[14] RYAN D,DENMAN S,FOOKES C B,et al.Crowd counting using multiple local features[C]//2009 Digital Image Computing:Techniques and Applications.IEEE,2009:81-88.
[15] LEMPITSKY V,ZISSERMAN A.Learning to count objects in images[C]//In Advances in Neural Information Processing Systems,2010:1324-1332.
[16] OJALA T,PIETIKAINEN,M,MAENPAA,T.Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2002,24(7):971-987.
[17] PARAGIOS N,RAMESH V.A MRF-based approach for real-time subway monitoring[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR 2001).IEEE,2001:1034-1040.
[18] PHAM V Q,KOZAKAYA T,YAMAGUCHI O,et al.Count Forest:Covoting Uncertain Number of Targets using Random Forest for Crowd Density Estimation[C]//International Confe-rence on Computer Vision (ICCV 2015).IEEE,2015:3253-3261.
[19] ZHANG Y,ZHAN D,CHEN S,et al.Single-image crowdcounting via multi-column convolutional neural network[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:589-597.
[20] SAM D B,SURYA S,BABU R V.Switching ConvolutionalNeural Network for Crowd Counting[C]//2017IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2017:4031-4039.
[21] LI Y,ZHANG X,CHEN D.CSRNet:dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1091-1100.
[22] KANG K,WANG X.Fully convolutional neural networks forcrowd segmentation[J].Computer Science,2014,49(1):25-30.
[23] KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[24] ZHANG C,LI H,WANG X,et al.Cross-scene crowd counting via deep convolutional neural networks[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2015:833-841.
[25] CAO X,WANG Z,ZHAO Y,et al.Scale aggregation network for accurate and efficient crowd counting[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:734-750.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[7] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[8] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[9] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[10] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[11] 杨涵, 万游, 蔡洁萱, 方铭宇, 吴卓超, 金扬, 钱伟行.
基于步态分类辅助的虚拟IMU的行人导航方法
Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification
计算机科学, 2022, 49(6A): 759-763. https://doi.org/10.11896/jsjkx.211200148
[12] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[13] 杨健楠, 张帆.
一种结合双注意力机制和层次网络结构的细碎农作物分类方法
Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure
计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169
[14] 张嘉淏, 刘峰, 齐佳音.
一种基于Bottleneck Transformer的轻量级微表情识别架构
Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer
计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[15] 王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤.
不同数据增强方法对模型识别精度的影响
Influence of Different Data Augmentation Methods on Model Recognition Accuracy
计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!