计算机科学 ›› 2024, Vol. 51 ›› Issue (11): 205-212.doi: 10.11896/jsjkx.230900013

• 计算机图形学&多媒体 • 上一篇    下一篇

结合对象属性识别的图像场景图生成方法研究

周浩1, 罗廷金2, 崔国恒1   

  1. 1 海军工程大学作战运筹与规划系 武汉 430033
    2 国防科技大学理学院 长沙 410073
  • 收稿日期:2023-09-04 修回日期:2024-04-06 出版日期:2024-11-15 发布日期:2024-11-06
  • 通讯作者: 罗廷金(tingjinluo@hotmail.com)
  • 作者简介:(zhouhao3075@hotmail.com)
  • 基金资助:
    国家自然科学青年科学基金(62302516)国家自然科学基金项目(62376281);湖北省自然科学基金项目(2022CFC049);湖南省湖湘青年人才项目(2021RC3070).

Scene Graph Generation Combined with Object Attribute Recognition

ZHOU Hao1, LUO Tingjin2, CUI Guoheng1   

  1. 1 Department of Operational Research and Planning,Naval University of Engineering,Wuhan 430033,China
    2 College of Science,National University of Defense Technology,Changsha 410073,China
  • Received:2023-09-04 Revised:2024-04-06 Online:2024-11-15 Published:2024-11-06
  • About author:ZHOU Hao,born in 1993,Ph.D,lectu-rer,is a member of CCF(No.T6933M).His main research interests include scene graph generation,image understanding and causal inference.
    LUO Tingjin,born in 1989,Ph.D,professor,master supervisor,is a senior member of CCF(No.C4089S).His main research interests include weakly supervised learning,data mining and machine learning etc.
  • Supported by:
    Young Scientists Fund of the National Natural Science Foundation of China(62302516),National Natural Science Foundation of China(62376281),Natural Science Foundation of Hubei Province,China(2022CFC049) and NSF for Huxiang Young Talents Program of Hunan Province(2021RC3070).

摘要: 场景图生成在视觉场景深度理解任务中发挥着重要的作用。现有的场景图生成方法主要关注场景中对象的位置、类别以及对象之间的关系,而忽略了对象属性蕴含的丰富场景语义信息。为了将图像属性语义融入场景图,提出了一种结合对象属性识别的图像场景图生成方法。首先针对属性识别的多标签分类问题,提出了一种基于混合分类器的属性分类损失函数来进行属性识别,通过结合二值交叉熵函数训练的二分类器和改进的团组交叉熵函数训练的多分类器来实现单个属性分类的查准率和多个属性预测的查全率全面提升。其次,通过将属性识别分支与原有场景图框架进行融合,将提取的属性信息作为额外的上下文语义与对象特征进行融合后辅助对象之间关系的识别。最后,模型在VG150数据集上与多个基准模型进行了对比实验,结果表明所提模型的对象属性预测和关系识别均取得了更优的结果。

关键词: 场景图生成, 对象属性识别, 属性融合, 关系预测, 多标签分类, 团组交叉熵函数

Abstract: Scene graph generation(SGG) plays an important role in deep visual understanding tasks.Existing SGG methods mainly focus on the locations and categories of objects,as well as the relationship between objects,while ignoring that the object attributes also contain rich semantic information.This paper proposes a SGG model integrating with the object attributes.Firstly,to achieve multi-label object attribution recognition,we propose the composite classifiers that combine the multi-class classification trained by improved group cross entropy loss and binary classification trained by binary cross entropy loss,which can improve the accuracy and recall of multiple attribute predictions.Then,the branch of attribution recognition is fused into the SGG framework.As a kind of context information,the attribution features are fed into the relationship branch for better relationship classification.Finally,compared with several baseline models,our method has achieved better performance in both object attribute prediction and relationship recognition on VG150 dataset.

Key words: Scene graph generation, Object attribute recognition, Attribute fusion, Relationshipclassifications, Multi-label lear-ning, Group cross entropy function

中图分类号: 

  • TP391
[1] HOU H,ZHANG J,LUO T,et al.Debiased Scene Graph Ge-neration for Dual Imbalance Learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(4):4274-4288.
[2] HUDSON D A,MANNING C D.Gqa:A new dataset for real-world visual reasoning and compositional question answering[C]//Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition.2019:6700-6709.
[3] ZOU Y,DU S,TENG F,et al.Visual Question Answering Mo-del Based on Multi-modal Deep Feature Fusion[J].Computer Science,2023,50(2):123-129.
[4] WU Q,SHEN C,WANG P,et al.Image captioning and visual question answering based on attributes and external knowledge[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(6):1367-1381.
[5] ZELLERS R,YATSKAR M,THOMSON S,et al.Neural motifs:Scene graph parsing with global context[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5831-5840.
[6] WOO S,KIM D,CHO D,et al.Linknet:Relational embedding for scene graph[C]//Proceedings of the Advances in Neural Information Processing Systems.2018.
[7] CHEN T,YU W,CHEN R,et al.Knowledge-embedded routing network for scene graph generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:6163-6171.
[8] LIN X,DING C,ZENG J,et al.Gps-net:Graph property sensing network for scene graph generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3746-3753.
[9] TANG K,NIU Y,HUANG J,et al.Unbiased scene graph ge-neration from biased training[C]//Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition.2020:3716-3725.
[10] LI R,ZHANG S,WAN B,et al.Bipartite graph network withadaptive message passing for unbiased scene graph generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:11109-11119.
[11] YAN S,SHEN C,JIN Z,et al.Pcpl:Predicate-correlation perception learning for unbiased scene graph generation[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:265-273.
[12] TAO L,MI L,LI N,et al.Predicate correlation learning for scene graph generation[J].IEEE Transactions on Image Processing,2022,31:4173-4185.
[13] SU C,ZHANG S,XING J,et al.Deep attributes driven multi-camera person re-identification[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands,October 11-14,2016,Proceedings,Part II 14.Springer International Publishing,2016:475-491.
[14] NAN Z,LIU Y,ZHENG N,et al.Recognizing unseen attribute-object pair with generative model[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8811-8818.
[15] WEI K,YANG M,WANG H,et al.Adversarial fine-grainedcomposition learning for unseen attribute-object recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3741-3749.
[16] TANG K,ZHANG H,WU B,et al.Learning to compose dynamic tree structures for visual contexts[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:6619-6628.
[17] LI L,CHEN L,HUANG Y,et al.The devil is in the labels:Noisy label correction for robust scene graph generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:18869-18878.
[18] YU J,CHAI Y,WANG Y,et al.Cogtree:Cognition tree loss for unbiased scene graph generation[C]//Proceedings of the International Joint Conference on Artificial Intelligence.2021:1274-1280.
[19] ZHOU H,ZHANG J,LUO T,et al,Debiased Scene Graph Generation for Dual Imbalance Learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(4):4274-4288.
[20] YANG X,YIN K,HOU S,et al.Person Re-identification Based on Feature Location and Fusion[J].Computer Science,2022,49(3):170-178.
[21] LAI X,CHEN S,YAN Y,et al.Survey on Deep Learning BasedFacial Attribute Recognition Methods[J].Journal of Computer Research and Development,2021,58(12):2760-2782.
[22] LIU P,LIU X,YAN J,et al.Localization guided learning for pedestrian attribute recognition[J].arXiv:1808.09102,2018.
[23] ZHAO X,SANG L,DING G,et al.Recurrent attention model for pedestrian attribute recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019,33(1):9275-9282.
[24] YANG J,FAN J,WANG Y,et al.Hierarchical feature embedding for attribute recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:13055-13064.
[25] YAMAGUCHI K,OKATANI T,SUDO K,et al.Mix andMatch:Joint Model for Clothing and Attribute Recognition[C]//BMVC.2015:4.
[26] TAREKEGN A N,GIACOBINI M,MICHALAK K.A review of methods for imbalanced multi-label classification[J].Pattern Recognition,2021,118:107965.
[27] KIM Y,KIM J M,AKATA Z,et al.Large loss matters in weakly supervised multi-label classification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:14156-14165.
[28] ZHANG Y,WANG Y,LIU X Y,et al.Large-scale multi-label classification using unknown streaming images[J].Pattern Re-cognition,2020,99:107100.
[29] WEVER M,TORNEDE A,MOHR F,et al.AutoML for multi-label classification:Overview and empirical evaluation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(9):3037-3054.
[30] LI J,LI P,HU X,et al.Learning common and label-specific features for multi-Label classification with correlation information[J].Pattern Recognition,2022,121:108259.
[31] WESTON J,BENGIO S,USUNIER N.Wsabie:Scaling up tolarge vocabulary image annotation.[C]//Proceedings of the International Joint Conference on Artificial Intelligence.2011:2764-2770.
[32] LI Y,SONG Y,LUO J.Improving pairwise ranking for multi-label image classification[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:3617-3625.
[33] SOHN K.Improved deep metric learning with multi-class n-pair loss objective[C]//Proceedings of the Advances in Neural Information Processing Systems.2016.
[34] SU J,ZHU M,MURTADHA A,et al.Zlpr:A novel loss for multi-label classification[J].arXiv :2208.02955,2022.
[35] KRISHNA R,ZHU Y,GROTH O,et al.Visual genome:Connecting language and vision using crowdsourced dense image annotations[J].International Journal of Computer Vision,2017,123:32-73.
[36] XIE S,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C] Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1492-1500.
[37] XU D,ZHU Y,CHOY C B,et al.Scene graph generation byiterative message passing[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:5410-5419.
[38] ZHANG H,KYAW Z,CHANG S F,et al.Visual translation embedding network for visual relation detection[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5532-5540.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!