计算机科学 ›› 2025, Vol. 52 ›› Issue (11): 141-149.doi: 10.11896/jsjkx.240900113
丁元博, 白琳, 李陶深
DING Yuanbo, BAI Lin, LI Taoshen
摘要: 细粒度信息作为一种上下文信息,能够辅助模型识别相对空间关系相似的人与物体交互动作。然而,如何利用这一关键线索统一建模多尺度特征图上不同粒度的特征信息,仍然是人与物体交互检测精度进一步提升面临的主要挑战之一。为了解决这一问题,提出了一种基于细粒度注意力机制的人与物体交互检测模型(FGDHOI)。该模型在细粒度信息的指导下强化局部特征,融合不同尺度的特征图,通过可变形注意力机制自动学习图像内容,并建模不同粒度特征之间的长距离依赖关系,从本质上提高了人与物体交互检测模型的精度。在V-COCO和HICO数据集上进行了广泛的定性、定量及消融实验。实验结果表明,所提出的方法相比基准模型,在V-COCO数据集上mAP提升了7.7个百分点,在HICO数据集3项指标上mAP分别提升了7.43个百分点、7.5个百分点和7.85个百分点。
中图分类号:
| [1]GUPTA S,MALIK J.Visual semantic role labeling[J].arXiv:1505.04474,2015. [2]SADEGHI M A,FARHADI A.Recognition using visual phrases[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2011:1745-1752. [3]WAN B,ZHOU D,LIU Y,et al.Pose-aware multi-level feature network for human object interaction detection[C]//Procee-dings of the IEEE/CVF International Conference on Computer Vision.2019:9469-9478. [4]LI Y L,ZHOU S,HUANG X,et al.Transferable interactive-ness knowledge for human-object interaction detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3585-3594. [5]YAN Z X,BAI L,LI T S.Lightweight human pose estimation based on self-knowledge distillation and convolution compression[J].Journal of Chinese Computer Systems,2024,45(2):461-469. [6]DALAL N,TRIGGS B.Histograms of oriented gradients forhuman detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Washington:IEEE Computer Society,2005:886-893. [7]LOWE D G.Distinctive image features from scale- invariantkeypoints[J].International Journal of Computer Vision,2004,60(2):91-110. [8]GAO C,ZOU Y,HUANG J B.ican:Instance centric attentionnetwork for human-object interaction detect-ion[J].arXiv:1808.10437,2018. [9]GKIOXARI G,GIRSHICK R,DOLLÁR P,et al.Detecting and recognizing human-object interactions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8359-8367. [10]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems.2015. [11]LI B Z,ZHANG J,WANG B L,et al.Human-Object Interaction Recognition Integrating Multi-level Visual Features [J].Computer Science,2022,49(S2):643-650. [12]LIN X,ZOU Q,XU X.Action-guided attention mining and relation reasoning network for human-object interaction detection[C]//Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence.2021:1104-1110. [13]SUN X,HU X,REN T,et al.Human object interaction detection via multi-level conditioned network[C]//Proceedings of the 2020 International Conference on Multimedia Retrieval.2020:26-34. [14]QI S,WANG W,JIA B,et al.Learning human-object interactions by graph parsing neural networks[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:401-417. [15]WANG H,ZHENG W,YINGBIAO L.Contextual heterogeneous graph network for human-object interaction detection[C]//Computer Vision-ECCV 2020:16th European Conference.Cham:Springer,2020:248-264. [16]ULUTAN O,IFTEKHAR A S M,MANJUNATH B S.Vsg-net:Spatial attention network for detecting human object interactions using graph convolutions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:13617-13626. [17]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017. [18]ZOU C,WANG B,HU Y,et al.End-to-end human object interaction detection with hoi transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:11825-11834. [19]TAMURA M,OHASHI H,YOSHINAGA T.Qpic:Query-based pairwise human-object interaction detection with image-wide contextual informa-tion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:10410-10419. [20]ZHOU P,CHI M.Relation parsing neural network for human-object interaction detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:843-851. [21]LIU H,MU T J,HUANG X.Detecting human—object interaction with multi-level pairwise feature network[J].Computa-tional Visual Media,2021,7:229-239. [22]CHEN M,LIAO Y,LIU S,et al.Reformulating hoi detection as adaptive set prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:9004-9013. [23]KIM B,LEE J,KANG J,et al.Hotr:End-to-end human-object interaction detection with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:74-83. [24]PARKJ,PARK J W,LEE J S.Viplo:Vision transformer based pose-conditioned self-loop graph for human-object interaction detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:17152-17162. [25]MA S,WANG Y,WANG S,et al.Fgahoi:Fine-grained anchors for human-object interaction detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2024,46(4):2415-2429. [26]WU M,GU J,SHEN Y,et al.End-to-end zero-shot hoi detection via vision and language knowledge distillation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:2839-2846. [27]RADFORD A,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning.PMLR,2021:8748-8763. |
|
||