Computer Science ›› 2024, Vol. 51 ›› Issue (11): 205-212.doi: 10.11896/jsjkx.230900013

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Scene Graph Generation Combined with Object Attribute Recognition

ZHOU Hao1, LUO Tingjin2, CUI Guoheng1   

  1. 1 Department of Operational Research and Planning,Naval University of Engineering,Wuhan 430033,China
    2 College of Science,National University of Defense Technology,Changsha 410073,China
  • Received:2023-09-04 Revised:2024-04-06 Online:2024-11-15 Published:2024-11-06
  • About author:ZHOU Hao,born in 1993,Ph.D,lectu-rer,is a member of CCF(No.T6933M).His main research interests include scene graph generation,image understanding and causal inference.
    LUO Tingjin,born in 1989,Ph.D,professor,master supervisor,is a senior member of CCF(No.C4089S).His main research interests include weakly supervised learning,data mining and machine learning etc.
  • Supported by:
    Young Scientists Fund of the National Natural Science Foundation of China(62302516),National Natural Science Foundation of China(62376281),Natural Science Foundation of Hubei Province,China(2022CFC049) and NSF for Huxiang Young Talents Program of Hunan Province(2021RC3070).

Abstract: Scene graph generation(SGG) plays an important role in deep visual understanding tasks.Existing SGG methods mainly focus on the locations and categories of objects,as well as the relationship between objects,while ignoring that the object attributes also contain rich semantic information.This paper proposes a SGG model integrating with the object attributes.Firstly,to achieve multi-label object attribution recognition,we propose the composite classifiers that combine the multi-class classification trained by improved group cross entropy loss and binary classification trained by binary cross entropy loss,which can improve the accuracy and recall of multiple attribute predictions.Then,the branch of attribution recognition is fused into the SGG framework.As a kind of context information,the attribution features are fed into the relationship branch for better relationship classification.Finally,compared with several baseline models,our method has achieved better performance in both object attribute prediction and relationship recognition on VG150 dataset.

Key words: Scene graph generation, Object attribute recognition, Attribute fusion, Relationshipclassifications, Multi-label lear-ning, Group cross entropy function

CLC Number: 

  • TP391
[1] HOU H,ZHANG J,LUO T,et al.Debiased Scene Graph Ge-neration for Dual Imbalance Learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(4):4274-4288.
[2] HUDSON D A,MANNING C D.Gqa:A new dataset for real-world visual reasoning and compositional question answering[C]//Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition.2019:6700-6709.
[3] ZOU Y,DU S,TENG F,et al.Visual Question Answering Mo-del Based on Multi-modal Deep Feature Fusion[J].Computer Science,2023,50(2):123-129.
[4] WU Q,SHEN C,WANG P,et al.Image captioning and visual question answering based on attributes and external knowledge[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(6):1367-1381.
[5] ZELLERS R,YATSKAR M,THOMSON S,et al.Neural motifs:Scene graph parsing with global context[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5831-5840.
[6] WOO S,KIM D,CHO D,et al.Linknet:Relational embedding for scene graph[C]//Proceedings of the Advances in Neural Information Processing Systems.2018.
[7] CHEN T,YU W,CHEN R,et al.Knowledge-embedded routing network for scene graph generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:6163-6171.
[8] LIN X,DING C,ZENG J,et al.Gps-net:Graph property sensing network for scene graph generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3746-3753.
[9] TANG K,NIU Y,HUANG J,et al.Unbiased scene graph ge-neration from biased training[C]//Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition.2020:3716-3725.
[10] LI R,ZHANG S,WAN B,et al.Bipartite graph network withadaptive message passing for unbiased scene graph generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:11109-11119.
[11] YAN S,SHEN C,JIN Z,et al.Pcpl:Predicate-correlation perception learning for unbiased scene graph generation[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:265-273.
[12] TAO L,MI L,LI N,et al.Predicate correlation learning for scene graph generation[J].IEEE Transactions on Image Processing,2022,31:4173-4185.
[13] SU C,ZHANG S,XING J,et al.Deep attributes driven multi-camera person re-identification[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands,October 11-14,2016,Proceedings,Part II 14.Springer International Publishing,2016:475-491.
[14] NAN Z,LIU Y,ZHENG N,et al.Recognizing unseen attribute-object pair with generative model[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8811-8818.
[15] WEI K,YANG M,WANG H,et al.Adversarial fine-grainedcomposition learning for unseen attribute-object recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3741-3749.
[16] TANG K,ZHANG H,WU B,et al.Learning to compose dynamic tree structures for visual contexts[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:6619-6628.
[17] LI L,CHEN L,HUANG Y,et al.The devil is in the labels:Noisy label correction for robust scene graph generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:18869-18878.
[18] YU J,CHAI Y,WANG Y,et al.Cogtree:Cognition tree loss for unbiased scene graph generation[C]//Proceedings of the International Joint Conference on Artificial Intelligence.2021:1274-1280.
[19] ZHOU H,ZHANG J,LUO T,et al,Debiased Scene Graph Generation for Dual Imbalance Learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(4):4274-4288.
[20] YANG X,YIN K,HOU S,et al.Person Re-identification Based on Feature Location and Fusion[J].Computer Science,2022,49(3):170-178.
[21] LAI X,CHEN S,YAN Y,et al.Survey on Deep Learning BasedFacial Attribute Recognition Methods[J].Journal of Computer Research and Development,2021,58(12):2760-2782.
[22] LIU P,LIU X,YAN J,et al.Localization guided learning for pedestrian attribute recognition[J].arXiv:1808.09102,2018.
[23] ZHAO X,SANG L,DING G,et al.Recurrent attention model for pedestrian attribute recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019,33(1):9275-9282.
[24] YANG J,FAN J,WANG Y,et al.Hierarchical feature embedding for attribute recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:13055-13064.
[25] YAMAGUCHI K,OKATANI T,SUDO K,et al.Mix andMatch:Joint Model for Clothing and Attribute Recognition[C]//BMVC.2015:4.
[26] TAREKEGN A N,GIACOBINI M,MICHALAK K.A review of methods for imbalanced multi-label classification[J].Pattern Recognition,2021,118:107965.
[27] KIM Y,KIM J M,AKATA Z,et al.Large loss matters in weakly supervised multi-label classification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:14156-14165.
[28] ZHANG Y,WANG Y,LIU X Y,et al.Large-scale multi-label classification using unknown streaming images[J].Pattern Re-cognition,2020,99:107100.
[29] WEVER M,TORNEDE A,MOHR F,et al.AutoML for multi-label classification:Overview and empirical evaluation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(9):3037-3054.
[30] LI J,LI P,HU X,et al.Learning common and label-specific features for multi-Label classification with correlation information[J].Pattern Recognition,2022,121:108259.
[31] WESTON J,BENGIO S,USUNIER N.Wsabie:Scaling up tolarge vocabulary image annotation.[C]//Proceedings of the International Joint Conference on Artificial Intelligence.2011:2764-2770.
[32] LI Y,SONG Y,LUO J.Improving pairwise ranking for multi-label image classification[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:3617-3625.
[33] SOHN K.Improved deep metric learning with multi-class n-pair loss objective[C]//Proceedings of the Advances in Neural Information Processing Systems.2016.
[34] SU J,ZHU M,MURTADHA A,et al.Zlpr:A novel loss for multi-label classification[J].arXiv :2208.02955,2022.
[35] KRISHNA R,ZHU Y,GROTH O,et al.Visual genome:Connecting language and vision using crowdsourced dense image annotations[J].International Journal of Computer Vision,2017,123:32-73.
[36] XIE S,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C] Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1492-1500.
[37] XU D,ZHU Y,CHOY C B,et al.Scene graph generation byiterative message passing[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:5410-5419.
[38] ZHANG H,KYAW Z,CHANG S F,et al.Visual translation embedding network for visual relation detection[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5532-5540.
[1] ZHU Xudong, LAI Teng. Multimodal Contrastive Learning Based Scene Graph Generation [J]. Computer Science, 2024, 51(11A): 231200185-5.
[2] ZHUANG Zhi-gang, XU Qing-lin. Scene Graph Generation Model Combining Multi-scale Feature Map and Ring-type RelationshipReasoning [J]. Computer Science, 2020, 47(4): 136-141.
[3] XU Feng-sheng,YU Xiu-qing and SHI Kai-quan. Intenal-Outer Fusion of Attributes and Intelligent Digging-Seperation of Information [J]. Computer Science, 2014, 41(7): 254-260.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!