计算机科学 ›› 2024, Vol. 51 ›› Issue (11): 205-212.doi: 10.11896/jsjkx.230900013
周浩1, 罗廷金2, 崔国恒1
ZHOU Hao1, LUO Tingjin2, CUI Guoheng1
摘要: 场景图生成在视觉场景深度理解任务中发挥着重要的作用。现有的场景图生成方法主要关注场景中对象的位置、类别以及对象之间的关系,而忽略了对象属性蕴含的丰富场景语义信息。为了将图像属性语义融入场景图,提出了一种结合对象属性识别的图像场景图生成方法。首先针对属性识别的多标签分类问题,提出了一种基于混合分类器的属性分类损失函数来进行属性识别,通过结合二值交叉熵函数训练的二分类器和改进的团组交叉熵函数训练的多分类器来实现单个属性分类的查准率和多个属性预测的查全率全面提升。其次,通过将属性识别分支与原有场景图框架进行融合,将提取的属性信息作为额外的上下文语义与对象特征进行融合后辅助对象之间关系的识别。最后,模型在VG150数据集上与多个基准模型进行了对比实验,结果表明所提模型的对象属性预测和关系识别均取得了更优的结果。
中图分类号:
[1] HOU H,ZHANG J,LUO T,et al.Debiased Scene Graph Ge-neration for Dual Imbalance Learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(4):4274-4288. [2] HUDSON D A,MANNING C D.Gqa:A new dataset for real-world visual reasoning and compositional question answering[C]//Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition.2019:6700-6709. [3] ZOU Y,DU S,TENG F,et al.Visual Question Answering Mo-del Based on Multi-modal Deep Feature Fusion[J].Computer Science,2023,50(2):123-129. [4] WU Q,SHEN C,WANG P,et al.Image captioning and visual question answering based on attributes and external knowledge[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(6):1367-1381. [5] ZELLERS R,YATSKAR M,THOMSON S,et al.Neural motifs:Scene graph parsing with global context[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5831-5840. [6] WOO S,KIM D,CHO D,et al.Linknet:Relational embedding for scene graph[C]//Proceedings of the Advances in Neural Information Processing Systems.2018. [7] CHEN T,YU W,CHEN R,et al.Knowledge-embedded routing network for scene graph generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:6163-6171. [8] LIN X,DING C,ZENG J,et al.Gps-net:Graph property sensing network for scene graph generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3746-3753. [9] TANG K,NIU Y,HUANG J,et al.Unbiased scene graph ge-neration from biased training[C]//Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition.2020:3716-3725. [10] LI R,ZHANG S,WAN B,et al.Bipartite graph network withadaptive message passing for unbiased scene graph generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:11109-11119. [11] YAN S,SHEN C,JIN Z,et al.Pcpl:Predicate-correlation perception learning for unbiased scene graph generation[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:265-273. [12] TAO L,MI L,LI N,et al.Predicate correlation learning for scene graph generation[J].IEEE Transactions on Image Processing,2022,31:4173-4185. [13] SU C,ZHANG S,XING J,et al.Deep attributes driven multi-camera person re-identification[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands,October 11-14,2016,Proceedings,Part II 14.Springer International Publishing,2016:475-491. [14] NAN Z,LIU Y,ZHENG N,et al.Recognizing unseen attribute-object pair with generative model[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8811-8818. [15] WEI K,YANG M,WANG H,et al.Adversarial fine-grainedcomposition learning for unseen attribute-object recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3741-3749. [16] TANG K,ZHANG H,WU B,et al.Learning to compose dynamic tree structures for visual contexts[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:6619-6628. [17] LI L,CHEN L,HUANG Y,et al.The devil is in the labels:Noisy label correction for robust scene graph generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:18869-18878. [18] YU J,CHAI Y,WANG Y,et al.Cogtree:Cognition tree loss for unbiased scene graph generation[C]//Proceedings of the International Joint Conference on Artificial Intelligence.2021:1274-1280. [19] ZHOU H,ZHANG J,LUO T,et al,Debiased Scene Graph Generation for Dual Imbalance Learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(4):4274-4288. [20] YANG X,YIN K,HOU S,et al.Person Re-identification Based on Feature Location and Fusion[J].Computer Science,2022,49(3):170-178. [21] LAI X,CHEN S,YAN Y,et al.Survey on Deep Learning BasedFacial Attribute Recognition Methods[J].Journal of Computer Research and Development,2021,58(12):2760-2782. [22] LIU P,LIU X,YAN J,et al.Localization guided learning for pedestrian attribute recognition[J].arXiv:1808.09102,2018. [23] ZHAO X,SANG L,DING G,et al.Recurrent attention model for pedestrian attribute recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019,33(1):9275-9282. [24] YANG J,FAN J,WANG Y,et al.Hierarchical feature embedding for attribute recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:13055-13064. [25] YAMAGUCHI K,OKATANI T,SUDO K,et al.Mix andMatch:Joint Model for Clothing and Attribute Recognition[C]//BMVC.2015:4. [26] TAREKEGN A N,GIACOBINI M,MICHALAK K.A review of methods for imbalanced multi-label classification[J].Pattern Recognition,2021,118:107965. [27] KIM Y,KIM J M,AKATA Z,et al.Large loss matters in weakly supervised multi-label classification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:14156-14165. [28] ZHANG Y,WANG Y,LIU X Y,et al.Large-scale multi-label classification using unknown streaming images[J].Pattern Re-cognition,2020,99:107100. [29] WEVER M,TORNEDE A,MOHR F,et al.AutoML for multi-label classification:Overview and empirical evaluation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(9):3037-3054. [30] LI J,LI P,HU X,et al.Learning common and label-specific features for multi-Label classification with correlation information[J].Pattern Recognition,2022,121:108259. [31] WESTON J,BENGIO S,USUNIER N.Wsabie:Scaling up tolarge vocabulary image annotation.[C]//Proceedings of the International Joint Conference on Artificial Intelligence.2011:2764-2770. [32] LI Y,SONG Y,LUO J.Improving pairwise ranking for multi-label image classification[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:3617-3625. [33] SOHN K.Improved deep metric learning with multi-class n-pair loss objective[C]//Proceedings of the Advances in Neural Information Processing Systems.2016. [34] SU J,ZHU M,MURTADHA A,et al.Zlpr:A novel loss for multi-label classification[J].arXiv :2208.02955,2022. [35] KRISHNA R,ZHU Y,GROTH O,et al.Visual genome:Connecting language and vision using crowdsourced dense image annotations[J].International Journal of Computer Vision,2017,123:32-73. [36] XIE S,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C] Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1492-1500. [37] XU D,ZHU Y,CHOY C B,et al.Scene graph generation byiterative message passing[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:5410-5419. [38] ZHANG H,KYAW Z,CHANG S F,et al.Visual translation embedding network for visual relation detection[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5532-5540. |
|