结合区域采样和类间损失的人体解析模型

doi:10.11896/jsjkx.220100259

计算机科学 ›› 2023, Vol. 50 ›› Issue (4): 103-109.doi: 10.11896/jsjkx.220100259

• 计算机图形学&多媒体 • 上一篇下一篇

结合区域采样和类间损失的人体解析模型

李杨, 韩屏

武汉理工大学信息工程学院武汉 430070

收稿日期:2022-01-27 修回日期:2022-06-23 出版日期:2023-04-15 发布日期:2023-04-06
通讯作者: 韩屏(hanping@whut.edu.cn)
作者简介:(yang_li314@163.com)
基金资助:
中央高校基础研究基金(WUT:2018III069GX)

Human Parsing Model Combined with Regional Sampling and Inter-class Loss

LI Yang, HAN Ping

School of Information Engineering,Wuhan University of Technology,Wuhan 430070,China

Received:2022-01-27 Revised:2022-06-23 Online:2023-04-15 Published:2023-04-06
About author:LI Yang,born in 1998,postgraduate.His main research interests include deep learning and semantic segmentation.
HAN Ping,born in 1980,Ph.D, asso-ciated professor.His main research interests include deep learning,computer vision,and embedded system.
Supported by:
Fundamental Research Funds for the Central Universities(WUT:2018III069GX).

摘要/Abstract

摘要： 人体解析是一项细粒度级别的语义分割任务,随着人体解析数据集中标注类别的精细化,人体解析数据集呈长尾分布,导致对相似类别的识别难度不断增大。均衡采样是解决长尾分布问题的有效方法。针对人体解析任务中难以对标注目标进行均衡采样和模型对相似类别的误判率增加等问题,文中提出了一种结合区域采样和类间损失的人体解析模型,该模型包含语义分割网络、区域均衡采样模块(Regionally Balanced Sampling Module,RBSM)和类间损失模块(Inter-class Loss Module,ILM)3个部分。首先将待解析图片送入语义分割网络得到初步预测结果,RBSM对初步的预测结果和真实标签进行采样,对采样后的预测结果和真实标签计算主损失;同时提取出语义分割网络的最后一层特征热图与真实标签,并将其送入ILM计算类间损失,让模型同时优化主损失和类间损失,最终得到精度更高的模型。在MHPv2.0数据集上的实验结果表明,该模型在不更改原有语义分割网络结构的基础上将mIoU评测指标提高了1.3%以上,有效缓解了长尾分布和类间的相似性给人体解析带来的影响。

关键词: 区域采样, 类间损失, 长尾分布, 人体解析, 语义分割

Abstract: Human parsing is a fine-grained level semantic segmentation task.The refinement of annotated categories in the human parsing dataset makes the dataset follow a long-tailed distribution and improves the difficulty of identifying similar categories.Balanced sampling is an efficient way to solve long-tailed distribution problem,but it’s difficult to achieve balanced sampling of the labeled object in human parsing.On the other hand,the fine-grained annotation will make the model misjudge similar categories.In response to these problems,a human parsing model combined with regional sampling and inter-class loss is proposed.The model consists of the semantic segmentation network,regionally balanced sampling module(RBSM),and inter-class loss module(ILM).Firstly,the images are parsed by the semantic segmentation network.Next,the parsing results and the ground truth labels are sampled by regionally balanced sampling module.Then the sampled parsing results and sampled ground truth labels are utilized to calculate the master loss.Meanwhile,the inter-class loss between the heatmap features coming from the semantic segmentation network and ground truth labels are calculated in the inter-class loss module,and the master loss and the inter-class loss are optimized at the same time to get a more accurate model.Experimental results based on the MHPv2.0 dataset show that the mIoU of the proposed model improves by more than 1.3% without changing the structure of the semantic segmentation network.The algorithm effectively reduces the impact of the long tail distribution problem and similarity among categories.

Key words: Regional sampling, Inter-loss, Long-tailed distribution, Human parsing, Semantic segmentation

中图分类号:

TP391

李杨, 韩屏. 结合区域采样和类间损失的人体解析模型[J]. 计算机科学, 2023, 50(4): 103-109. https://doi.org/10.11896/jsjkx.220100259

LI Yang, HAN Ping. Human Parsing Model Combined with Regional Sampling and Inter-class Loss[J]. Computer Science, 2023, 50(4): 103-109. https://doi.org/10.11896/jsjkx.220100259

参考文献

[1]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston,USA:IEEE,2015:3431-3440.
[2]LIANG X,GONG K,SHEN X,et al.Look into person:Jointbody parsing & pose estimation network and a new benchmark[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(4):871-885.
[3]RUAN T,LIU T,HUANG Z,et al.Devil in the details:To-wards accurate single and multiple human parsing[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Honolulu,USA:AAAI,2019,33(1):4814-4821.
[4]LI P,XU Y,WEI Y,et al.Self-correction for human parsing[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,44(6):3260-3271.
[5]LIU K,CHOI O,WANG J,et al.CDGNet:Class DistributionGuided Network for Human Parsing[J]. arXiv:2111.14173,2021.
[6]LIANG X,LIU S,SHEN X,et al.Deep human parsing with active template regression[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(12):2402-2414.
[7]GONG K,LIANG X,ZHANG D,et al.Look into person:Self-supervised structure-sensitive learning and a new benchmark for human parsing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,USA:IEEE,2017:932-940.
[8]GONG K,LIANG X,LI Y,et al.Instance-level human parsing via part grouping network[C]//Proceedings of the European Conference on Computer Vision(ECCV).Berlin,German:Springer,2018:770-785.
[9]YAMAGUCHI K,KIAPOUR M H,ORTIZ L E,et al.Parsing clothing in fashion photographs[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition.Providence USA:IEEE,2012:3570-3577.
[10]ZHAO J,LI J,CHENG Y,et al.Understanding humans incrowded scenes:Deep nested adversarial learning and a new benchmark for multi-human parsing[C]//Proceedings of the 26th ACM International Conference on Multimedia.New York,NY:ACM,2018:792-800.
[11]CUI Y,JIA M,LIN T Y,et al.Class-balanced loss based on effective number of samples[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,USA:IEEE,2019:9268-9277.
[12]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy:IEEE,2017:2980-2988.
[13]LIU Z,MIAO Z,ZHAN X,et al.Large-scale long-tailed recognition in an open world[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,USA:IEEE,2019:2537-2546.
[14]WANG J,ZHANG W,ZANG Y,et al.Seesaw loss for long-tailed instance segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Nashville,USA:IEEE,2021:9695-9704.
[15]BULÒ S R,NEUHOLD G,KONTSCHIEDER P.Loss max-pooling for semantic image segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,USA:IEEE,2017:2126-2135.
[16]ZHOU B,CUI Q,WEI X S,et al.Bbn:Bilateral-branch network with cumulative learning for long-tailed visual recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle,USA:IEEE,2020:9719-9728.
[17]KANG B,XIE S,ROHRBACH M,et al.Decoupling representation and classifier for long-tailed recognition[J].arXiv:1910.09217,2019.
[18]WEN Y,ZHANG K,LI Z,et al.A discriminative feature lear-ning approach for deep face recognition[C]//European Confe-rence on Computer Vision.Berlin,German:Springer,2016:499-515.
[19]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Compu-ter-assisted Intervention.Berlin,German:Springer,2015:234-241.
[20]ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,USA:IEEE,2017:2881-2890.
[21]CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv:1706.05587,2017.
[22]CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer vision(ECCV).Berlin,German:Springer,2018:801-818.
[23]FU J,LIU J,TIAN H,et al.Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,USA:IEEE,2019:3146-3154.
[24]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.Miami,USA:IEEE,2009:248-255.

相关文章 15

[1]	马玮琦, 袁家斌, 查可可, 范利利. 一种基于脉冲神经网络的星体表面岩石检测算法 Onboard Rock Detection Algorithm Based on Spiking Neural Network 计算机科学, 2023, 50(1): 98-104. https://doi.org/10.11896/jsjkx.211100149
[2]	程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[3]	胡伏原, 万新军, 沈鸣飞, 徐江浪, 姚睿, 陶重犇. 深度卷积神经网络图像实例分割方法研究进展 Survey Progress on Image Instance Segmentation Methods of Deep Convolutional Neural Network 计算机科学, 2022, 49(5): 10-24. https://doi.org/10.11896/jsjkx.210200038
[4]	金玉杰, 初旭, 王亚沙, 赵俊峰. 变分推断域适配驱动的城市街景语义分割 Variational Domain Adaptation Driven Semantic Segmentation of Urban Scenes 计算机科学, 2022, 49(11): 126-133. https://doi.org/10.11896/jsjkx.220500193
[5]	王施云, 杨帆. 基于U-Net特征融合优化策略的遥感影像语义分割方法 Remote Sensing Image Semantic Segmentation Method Based on U-Net Feature Fusion Optimization Strategy 计算机科学, 2021, 48(8): 162-168. https://doi.org/10.11896/jsjkx.200700182
[6]	詹瑞, 雷印杰, 陈训敏, 叶书函. 基于多重差异特征网络的街景变化检测 Street Scene Change Detection Based on Multiple Difference Features Network 计算机科学, 2021, 48(2): 142-147. https://doi.org/10.11896/jsjkx.200500158
[7]	王鑫, 张昊宇, 凌诚. 基于U-Net优化的SAR遥感图像语义分割 Semantic Segmentation of SAR Remote Sensing Image Based on U-Net Optimization 计算机科学, 2021, 48(11A): 376-381. https://doi.org/10.11896/jsjkx.210300260
[8]	朱戎, 叶宽, 杨博, 谢欢, 赵蕾. 基于改进DeeplabV3+的地物分类方法研究 Feature Classification Method Based on Improved DeeplabV3+ 计算机科学, 2021, 48(11A): 382-385. https://doi.org/10.11896/jsjkx.201100184
[9]	任天赐, 黄向生, 丁伟利, 安重阳, 翟鹏博. 全局双边网络的语义分割算法 Global Bilateral Segmentation Network for Segmantic Segmentation 计算机科学, 2020, 47(6A): 161-165. https://doi.org/10.11896/JsJkx.191200127
[10]	刘彬, 刘宏哲. 基于改进Enet网络的车道线检测算法 Lane Detection Algorithm Based on Improved Enet Network 计算机科学, 2020, 47(4): 142-149. https://doi.org/10.11896/jsjkx.190500021
[11]	周鹏程,龚声蓉,钟珊,包宗铭,戴兴华. 基于深度特征融合的图像语义分割 Image Semantic Segmentation Based on Deep Feature Fusion 计算机科学, 2020, 47(2): 126-134. https://doi.org/10.11896/jsjkx.190100119
[12]	王赛男, 郑雄风. 基于边缘计算的图像语义分割应用与研究 Application and Research of Image Semantic Segmentation Based on Edge Computing 计算机科学, 2020, 47(11A): 276-280. https://doi.org/10.11896/jsjkx.200900046
[13]	杨培健, 吴晓富, 张索非, 周全. 基于空洞卷积鉴别器的语义分割迁移算法 Semantic Segmentation Transfer Algorithm Based on Atrous Convolution Discriminator 计算机科学, 2020, 47(11): 174-178. https://doi.org/10.11896/jsjkx.191100014
[14]	王嫣然, 陈清亮, 吴俊君. 面向复杂环境的图像语义分割方法综述 Research on Image Semantic Segmentation for Complex Environments 计算机科学, 2019, 46(9): 36-46. https://doi.org/10.11896/j.issn.1002-137X.2019.09.005
[15]	缪永伟, 李高怡, 鲍陈, 张旭东, 彭思龙. 基于卷积神经网络的图像局部风格迁移 Image Localized Style Transfer Based on Convolutional Neural Network 计算机科学, 2019, 46(9): 259-264. https://doi.org/10.11896/j.issn.1002-137X.2019.09.039

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

结合区域采样和类间损失的人体解析模型

Human Parsing Model Combined with Regional Sampling and Inter-class Loss

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0