计算机科学 ›› 2023, Vol. 50 ›› Issue (3): 246-253.doi: 10.11896/jsjkx.220100219

• 计算机图形学&多媒体 • 上一篇    下一篇

特征增强损失与前景注意力人群计数网络

张译1, 吴秦1,2   

  1. 1 江南大学人工智能与计算机学院 江苏 无锡 214122
    2 江南大学江苏省模式识别与计算智能工程实验室 江苏 无锡 214122
  • 收稿日期:2022-01-23 修回日期:2022-10-04 出版日期:2023-03-15 发布日期:2023-03-15
  • 通讯作者: 吴秦(qinwu@jiangnan.edu.cn)
  • 作者简介:(fiercetigerr@outlook.com)
  • 基金资助:
    国家自然科学基金(61972180)

Crowd Counting Network Based on Feature Enhancement Loss and Foreground Attention

ZHANG Yi1, WU Qin1,2   

  1. 1 School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi,Jiangsu 214122,China
    2 Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence,Jiangnan University,Wuxi,Jiangsu 214122, China
  • Received:2022-01-23 Revised:2022-10-04 Online:2023-03-15 Published:2023-03-15
  • About author:ZHANG Yi,born in 1996,master candidate,is a member of China Computer Federation.Her main research interests include pattern recognition and compu-ter vision.
    WU Qin,born in 1978,Ph.D,associate professor,is a member of China Computer Federation.Her main research interests include computer vision and pattern recognition.
  • Supported by:
    National Natural Science Foundation of China(61972180).

摘要: 人群计数旨在准确估计图像中的总人数并呈现其分布。相关数据集中的图像通常涉及各类场景且包含多人。为节约人力,大多数数据集通常在每个人头部以单点标注作为标签。然而,点标签无法囊括人头部的完整范围,使得人群特征与分布标签的匹配难以收敛,预测值无法聚集在前景区域,严重影响密度估计图质量和模型计数准确度。为了解决这个问题,使用计数损失来约束全图上的预测值范围,并佐以像素级的分布一致损失优化密度图匹配过程。此外,复杂场景中存在许多易与人群特征混淆的背景噪声,为了避免假阳性预测对后续计数和密度图估计的干扰,提出前景分割模块和特征增强损失来自适应地聚焦前景区域,并加大前景位置上人头特征对计数的贡献,从而达到抑制背景误判的作用。此外,为了使网络更好地适应人头的多尺度形态,对每个待训练图片分别进行上下采样操作,以获得具有同目标的多尺度形态。在多个数据集上进行了实验,结果表明,与最先进的方法相比,所提方法取得了更好或更有竞争力的结果。

关键词: 人群计数, 深度学习, 前景分割, 背景补偿, 密度估计

Abstract: Crowd counting aims to estimate the total number of people in an image and present its distribution accurately.The images in the relevant datasets usually involve a variety of scenes and include multiple people.To save labor,most datasets usually annotated each human head by a single point.However,the point labels cannot cover the full human head,which makes it difficult to converge the matching between the crowd feature and the distribution label,and the predicted values cannot be gathered in the foreground region,which seriously affects the density estimation map quality and count accuracy.To solve this problem,count loss is used to constrain the range of predictions on the full map,and a pixel-level distribution consistency loss is used to optimize the density map matching process.In addition,there are many background noises that are easily confused with crowd feature in complex scenes.In order to avoid the interference of false positive predictions on subsequent counting and density map estimation,a foreground segmentation module and feature enhancement loss are proposed to adaptively focus the foreground region and increase the contribution of human head features to the counts,so as to suppress background misjudgments.In addition,in order to make the network adapt to the multi-scale pattern of the human head better,up and down sampling operations are performed on each image to be trained to obtain the multi-scale pattern with the same object.Experiments on several datasets show that the proposed method achieves better or competitive results compared with state-of-the-art methods.

Key words: Crowd counting, Deep learning, Foreground segmentation, Background compensation, Density estimation

中图分类号: 

  • TP391.413
[1]LIN S F,CHEN J Y,CHAO H X.Estimation of number of people in crowded scenes using perspective transformation[J].IEEE Transactions on Systems,Man,and Cybernetics-Part A:Systems and Humans,2001,31(6):645-654.
[2]ZHAO T,NEVATIA R.Bayesian human segmentation incrowded situations[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2003.
[3]LI M,ZHANG Z X,HUANG K Q,et al.Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection[C]//Proceedings of the IEEE International Conference on Pattern Recognition.IEEE,2008:1-4.
[4]GE W,COLLINS R T.Marked point processes for crowd coun-ting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:2913-2920.
[5]CHAN A B,LIANG Z S J,VASCONCELOS N.Privacy preserving crowd monitoring:Counting people without people models or tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2008:1-7.
[6]RYAN D,DENMAN S,FOOKES C,et al.Crowd counting using multiple local features[C]//Digital Image Computing:Techniques and Applications.2009:81-88.
[7]CHEN K,GONG S,XIANG T,et al.Cumulative attribute space for age and crowd density estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2013:2467-2474.
[8]LIU B,VASCONCELOS N.Bayesian model adaptation forcrowd counts[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2015:4175-4183.
[9]WANG C,ZHANG H,YANG L,et al.Deep people counting in extremely dense crowds[C]//Proceedings of the 2015 ACM Multimedia Conference.ACM,2015:1299-1302.
[10]SHANG C,AI H,BAI B.End-to-end crowd counting via joint learning local and global count[C]//Proceedings of the IEEE International Conference on Image Processing.IEEE,2016:1215-1219.
[11]XIONG F,SHI X,YEUNG D Y.Spatiotemporal modeling for crowd counting in videos[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2017:5161-5169.
[12]SHI Z L,ZHANG L,LIU Y,et al.Crowd counting with deep negative correlation learning[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2018:5382-5390.
[13]LIU X L,WEIJER J,BAGDANOV A D.Leveraging unlabeled data for crowd counting by learning to rank[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2018:7661-7669.
[14]RANJAN V,LE H,HOAI M.Iterative crowd counting[C]//Proceedings of the European Conference on Computer Vision.Springer,2018:270-285.
[15]CAO X K,WANG Z P,ZHAO Y Y,et al.Scale aggregation network for accurate and efficient crowd counting[C]//Proceedings of the European Conference on Computer Vision.Springer,2018:734-750.
[16]IDREES H,TAYYAB M,ATHREY K,et al.Composition loss for counting,density map estimation and localization in dense crowds[C]//Proceedings of the European Conference on Computer Vision.Springer,2018:532-546.
[17]ZHANG Y Y,ZHOU D,CHEN S Q,et al.Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:589-597.
[18]WANG B Y,LIU H D,SAMARAS D,et al.Distribution Matching for Crowd Counting[C]//Proceedings of the Advances in Neural Information Processing Systems.MIT Press,2020:1595-1607.
[19]MODOLO D,SHUAI B,VARIOR R R,et al.Understanding the impact of mistakes on background regions in crowd counting[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision.IEEE,2021:1649-1658.
[20]IDREES H,SALEEMI I,SEIBERT C,et al.Multi-source multi-scale counting in extremely dense crowd images[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2013:2547-2554.
[21]SINDAGI V,YASARLA R,PATEL V M M.JHU-CROWD++:Large-scale crowd counting dataset and a benchmark method[J].IEEE Transactions on Pattern Analysis and Machine Intelligence.2020:1-17.
[22]ARTETA C,LEMPITSKY V,ZISSERMAN A.Counting in the wild[C]//Proceedings of the European Conference on Computer Vision.Springer,2016:483-498.
[23]SHI Z,METTES P,SNOEK C.Counting with focus for free[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:4199-4208.
[24]WAN J,LUO W H,WU B Y,et al.Residual regression with semantic prior for crowd counting[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:4031-4040.
[25]LI Y H,ZHANG X F,CHEN D M.Csrnet:Dilated convolu-tional neural networks for understanding the highly congested scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2018:1091-1100.
[26]ZHANG C,KANG K,LIH S,et al.Data-driven crowd understanding:A baseline for a large-scale Crowd dataset[J].IEEE Transactions on Multimedia,2016,18(6):1048-1061.
[27]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].(2016-08-22)[2021-04-06].https://arxiv.org/abs/1608.06197.
[28]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2017:2999-3007.
[29]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255.
[30]KINGMA D P,BA J.Adam:A method for stochastic optimization[EB/OL].(2017-01-30)[2021-11-23].https://arxiv.org/abs/1412.6980v9.
[31]PENG X,PENG Y X,TANG Q.Crowd Counting Based on Single-column Multi-scale Convolutional Neural Network[J].Computer Science,2019,47(4):150-156.
[32]WAN J,CHAN A.Adaptive density map generation for crowd counting[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:1130-1139.
[33]MA Z H,WEI X,HONG X P,et al.Bayesian loss for crowd count estimation with point supervision[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:6141-6150.
[34]ABOUSAMRA S,HOAI M,SAMARAS D,et al.Localization in the crowd with topological constraints[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI Press,2021:872-881.
[35]XU C F,LIANG D K,XU Y C,et al.Autoscale:learning to scale for crowd counting[J].International Journal of Computer Vision.2022,130:405-434.
[36]THANASUTIVES P,FUKUI K,NUMAO M,et al.Encoder-decoder based convolutional neural networks with multi-scale-aware modules for crowd counting[C]//Proceedings of the IEEE International Conference on Pattern Recognition.IEEE,2021:2382-2389.
[37]TIAN Y K,LEI Y M,ZHANG J P,et al.PaDNet:Pan-density crowd counting[J].IEEE Transactions on Image Processing.2019,29:2714-2727.
[38]SINDAGI V A,PATEL V M.Generating high-quality crowddensity maps using contextual pyramid cnns[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2017:1861-1870
[39]ZHANG A,YUE L,SHEN J,et al.Attentional neural fifields for crowd counting[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:5714-5713.
[40]GUO,D,LI,K,ZHA Z J,et al.DADNet:Dilated-Attention-Deformable ConvNet for Crowd Counting[C]//Proceedings of the 27th ACM International Conference on Multimedia.ACM,2019:1823-1832.
[41]LIU N,LONG Y C,ZOU C Q,et al.ADCrowdNet:An attention-injective deformable convolutional network for crowd understanding[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2019:3225-3234.
[42]LIU C C,WENG X Y,MU Y D.Recurrent attentive zooming for joint crowd counting and precise localization[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019:1217-1226.
[43]LIU W Z,SALZMANN M,FUA P.Context-aware crowdcounting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019:5094-5103.
[1] 董永峰, 黄港, 薛婉若, 李林昊.
融合IRT的图注意力深度知识追踪模型
Graph Attention Deep Knowledge Tracing Model Integrated with IRT
计算机科学, 2023, 50(3): 173-180. https://doi.org/10.11896/jsjkx.211200134
[2] 华晓凤, 冯娜, 于俊清, 何云峰.
基于规则推理的足球视频任意球射门事件检测
Shooting Event Detection of Free Kick in Soccer Video Based on Rule Reasoning
计算机科学, 2023, 50(3): 181-190. https://doi.org/10.11896/jsjkx.220300062
[3] 梅鹏程, 杨吉斌, 张强, 黄翔.
一种基于三维卷积的声学事件联合估计方法
Sound Event Joint Estimation Method Based on Three-dimension Convolution
计算机科学, 2023, 50(3): 191-198. https://doi.org/10.11896/jsjkx.220500259
[4] 白雪飞, 马亚楠, 王文剑.
基于特征融合的边缘引导乳腺超声图像分割方法
Segmentation Method of Edge-guided Breast Ultrasound Images Based on Feature Fusion
计算机科学, 2023, 50(3): 199-207. https://doi.org/10.11896/jsjkx.211200294
[5] 刘航, 普园媛, 吕大华, 赵征鹏, 徐丹, 钱文华.
极化自注意力约束颜色溢出的图像自动上色
Polarized Self-attention Constrains Color Overflow in Automatic Coloring of Image
计算机科学, 2023, 50(3): 208-215. https://doi.org/10.11896/jsjkx.220100149
[6] 陈亮, 王璐, 李生春, 刘昌宏.
基于深度学习的可视化仪表板生成技术研究
Study on Visual Dashboard Generation Technology Based on Deep Learning
计算机科学, 2023, 50(3): 238-245. https://doi.org/10.11896/jsjkx.230100064
[7] 应宗浩, 吴槟.
深度学习模型的后门攻击研究综述
Backdoor Attack on Deep Learning Models:A Survey
计算机科学, 2023, 50(3): 333-350. https://doi.org/10.11896/jsjkx.220600031
[8] 邹芸竹, 杜圣东, 滕飞, 李天瑞.
一种基于多模态深度特征融合的视觉问答模型
Visual Question Answering Model Based on Multi-modal Deep Feature Fusion
计算机科学, 2023, 50(2): 123-129. https://doi.org/10.11896/jsjkx.211200303
[9] 王鹏宇, 台文鑫, 刘芳, 钟婷, 罗绪成, 周帆.
基于数据增强的自监督飞行航迹预测
Self-supervised Flight Trajectory Prediction Based on Data Augmentation
计算机科学, 2023, 50(2): 130-137. https://doi.org/10.11896/jsjkx.211200016
[10] 郭楠, 李婧源, 任曦.
基于深度学习的刚体位姿估计方法综述
Survey of Rigid Object Pose Estimation Algorithms Based on Deep Learning
计算机科学, 2023, 50(2): 178-189. https://doi.org/10.11896/jsjkx.211200164
[11] 李俊林, 欧阳智, 杜逆索.
基于改进区域候选网络的场景文本检测
Scene Text Detection with Improved Region Proposal Network
计算机科学, 2023, 50(2): 201-208. https://doi.org/10.11896/jsjkx.211000191
[12] 华杰, 刘学亮, 赵烨.
基于特征融合的小样本目标检测
Few-shot Object Detection Based on Feature Fusion
计算机科学, 2023, 50(2): 209-213. https://doi.org/10.11896/jsjkx.220500153
[13] 梁佳利, 华保健, 苏少博.
融合循环划分的张量指令生成优化
Tensor Instruction Generation Optimization Fusing with Loop Partitioning
计算机科学, 2023, 50(2): 374-383. https://doi.org/10.11896/jsjkx.220300147
[14] 蔡肖, 陈志华, 盛斌.
基于移位窗口金字塔Transformer的遥感图像目标检测
SPT:Swin Pyramid Transformer for Object Detection of Remote Sensing
计算机科学, 2023, 50(1): 105-113. https://doi.org/10.11896/jsjkx.211100208
[15] 王斌, 梁宇栋, 刘哲, 张超, 李德玉.
亮度自调节的无监督图像去雾与低光图像增强算法研究
Study on Unsupervised Image Dehazing and Low-light Image Enhancement Algorithms Based on Luminance Adjustment
计算机科学, 2023, 50(1): 123-130. https://doi.org/10.11896/jsjkx.211100058
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!