融合多层次视觉信息的人物交互动作识别

doi:10.11896/jsjkx.220700012

Abstract

Abstract: Computer vision based human action recognition technique has a broad application in the fields of video surveillance,intelligent driving,human-computer interaction,multimedia content audit,etc.More importantly,human-object interaction is one of the core components in human action recognition.Most of the existing human-object interaction action recognition models,which are based on multi-stream convolutional neural networks,only capturing the visual features superficially.They fail to fully explore the key areas of human body and the deep semantic relationship between human and objects.To solve this problem,this paper proposes a hierarchical graph neural network(HGNN) model.HGNN explicitly models the critical areas of the human body and the interaction of human-object in the scene from local to global,and uses an attention pooling mechanism(AttPool) to eliminate redundant information and noise in the graph.Then,the deep semantic relationship between graph nodes are captured by the graph convolution network,and the initial features extracted by convolutional neural network are aggregated and optimized.In this way,the feature representation which reflects the essential character of human-object interaction can be obtained.In addition,the interim supervised classification in the middle graph can also constrain the model to better learn the human patterns of interactive actions,and avoid the model to produce “bias” on the interactive objects.Finally,a multi-task loss function is designed for the HGNN to effectively train the model.To test and verify the effectiveness of the proposed HGNN model,extensive experimental evaluations on the famous public benchmark V-COCO have been conducted.The results show that the proposed HGNN model is adaptive and robust for human-object interaction detection,which outperforms the previous graph neural network based me-thods by a large margin,and also performs better than most of the latest convolutional neural network based models.

Key words: Computer vision, Human action recognition, Human-Object interaction, Deeplearning, Graph neural network

CLC Number:

TP391

LI Bao-zhen, ZHANG Jin, WANG Bao-lu, YU Ping. Human-Object Interaction Recognition Integrating Multi-level Visual Features[J].Computer Science, 2022, 49(11A): 220700012-8.

References

[1]CHAO Y W,LIU Y,LIU X,et al.Learning to detect human-object interactions[C]//2018 IEEE Winter Conference on Applications of Computer Vision(WACV).IEEE,2018:381-389.
[2]QI S,WANG W,JIA B,et al.Learning human-object interactions by graph parsing neural networks[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:401-417.
[3]TANG C,WANG W J,ZHANG C,et al.Human Action Recognition Using RGB-D Image Features[J].Pattern Recognition and Artificial Intelligence,2019,32(10):901-908.
[4]LI X,XIAO Q K.Human Action Recognition Using Auto-encode and PNN Neural Network[J].Software Guide,2018,17(1).
[5]ZHANG J,JIA Y,XIE W,et al.Zoom Transformer for Skeleton-based Group Activity Recognition[C]//IEEE Transactions on Circuits and Systems for Video Technology.2022.
[6]ZHANG J, YE G,TU Z,et al.A spatial attentive and temporal dilated (SATD) GCN for skeleton-based action recognition[J].CAAI Transactions on Intelligence Technology,2022,7(1):46-55.
[7]ZITNIK M,LESKOVEC J.Predicting multicellular function throu-gh multi-layer tissue networks[J].Bioinformatics,2017,33(14):190-198.
[8]ZHOU Z,WANG Y,XIE X,et al.RiskOracle:A Minute-level Citywide Traffic Accident Forecasting Framework[J].arXiv:2003.00819,2020.
[9]TANG L,LIU H.Relational learning via latent social dimensions[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2009:817-826.
[10]SUHAIL M,SIGAL L.Mixture-Kernel Graph Attention Net-work for Situation Recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:10363-10372.
[11]WU J,WANG L,WANG L,et al.Learning actor relation graphs for group activity recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:9964-9974.
[12]SCARSELLI F,GORI M,TSOI A C,et al.The graph neural network model[J].IEEE Transactions on Neural Networks,2008,20(1):61-80.
[13]DEFFERRARD M,BRESSON X,VANDERGHEYNST P.Convolutional neural networks on graphs with fast localized spectral filtering[C]//Advances in Neural Information Processing Systems.2016:3844-3852.
[14]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[J].arXiv:1609.02907,2016.
[15]MONTI F,BOSCAINI D,MASCI J,et al.Geometric deep lear-ning on graphs and manifolds using mixture model cnns[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5115-5124.
[16]YAN S,XIONG Y,LIN D.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Thirty-se-cond AAAI Conference on Artificial Intelligence.2018.
[17]LI L,GAN Z,CHENG Y,et al.Relation-aware graph attention network for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:10313-10322.
[18]HENAFF M,BRUNA J,LECUN Y.Deep convolutional net-works on graph-structured data[J].arXiv:1506.05163,2015.
[19]DEFFERRARD M,BRESSON X,VANDERGHEYNST P.Convolutional neural networks on graphs with fast localized spectral filtering[C]//Advances in Neural Information Processing Systems.2016:3844-3852.
[20]JOAN B,WOJCIECH Z,ARTHUR S,et al.Spectral networks and locally connected networks ongraphs[J].arXiv:1312.6203,2013.
[21]GAO H,JI S.Graph u-nets[J].arXiv:1905.05178,2019.
[22]YING Z,YOU J,MORRIS C,et al.Hierarchicalgraph representation learning with differentiable pooling[C]//Advances in Neural Information Processing Systems.2018:4800-4810.
[23]LEE J,LEE I,KANG J.Self-attention graph pooling[J].arXiv:1904.08082,2019.
[24]HUANG J,LI Z,LI N,et al.AttPool:Towards HierarchicalFeature Representation in Graph Convolutional Networks via Attention Mechanism[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:6480-6489.
[25]GUPTA A,KEMBHAVI A,DAVIS L S.Observing human-object interactions:Using spatial and functional compatibility for recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(10):1775-1789.
[26]SAURABH G,JITENDRA M.Visual semantic role labeling[J].arXiv:1505.04474,2015.
[27]CHAO Y W,WANG Z,HE Y,et al.Hico:A benchmark for reco-gnizing human-object interactions in images[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1017-1025.
[28]GKIOXARI G,GIRSHICK R,DOLLAR P,et al.Detecting and recognizing human-object interactions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8359-8367.
[29]GAO C,ZOU Y,HUANG J B.ican:Instance-centric attention network for human-object interaction detection[J].arXiv:1808.10437,2018.
[30]LI Y L,ZHOU S,HUANG X,et al.Transferable interactive-ness knowledge for human-object interaction detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:3585-3594.
[31]WAN B,ZHOU D,LIU Y,et al.Pose-aware Multi-level Feature Network for Human Object Interaction Detection[C]//Procee-dings of the IEEE International Conference on Computer Vision.2019:9469-9478.
[32]ZHOU P,CHI M.Relation Parsing Neural Network for Human-Object Interaction Detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:843-851.
[33]XU B,WONG Y,LI J,et al.Learning to detect human-object interactions with knowledge[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019.
[34]HE K,GKIOXARI G,DOLLAR P,et al.Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2961-2969.
[35]PASZKE A,GROSS S,MASSA F,et al.PyTorch:An imperative style,high-performance deep learning library[C]//Advances in Neural Information Processing Systems.2019:8024-8035.
[36]RAHMAN M A,WANG Y.Optimizing intersection-over-union in deep neural networks for image segmentation[C]//International symposium on visual computing.Cham:Springer,2016:234-244.

Related Articles 15

[1]	ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[2]	YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[3]	QI Xiu-xiu, WANG Jia-hao, LI Wen-xiong, ZHOU Fan. Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning [J]. Computer Science, 2022, 49(7): 18-24.
[4]	YANG Bing-xin, GUO Yan-rong, HAO Shi-jie, Hong Ri-chang. Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition [J]. Computer Science, 2022, 49(7): 57-63.
[5]	DENG Zhao-yang, ZHONG Guo-qiang, WANG Dong. Text Classification Based on Attention Gated Graph Neural Network [J]. Computer Science, 2022, 49(6): 326-334.
[6]	XIONG Zhong-min, SHU Gui-wen, GUO Huai-yu. Graph Neural Network Recommendation Model Integrating User Preferences [J]. Computer Science, 2022, 49(6): 165-171.
[7]	YU Ai-xin, FENG Xiu-fang, SUN Jing-yu. Social Trust Recommendation Algorithm Combining Item Similarity [J]. Computer Science, 2022, 49(5): 144-151.
[8]	LI Yong, WU Jing-peng, ZHANG Zhong-ying, ZHANG Qiang. Link Prediction for Node Featureless Networks Based on Faster Attention Mechanism [J]. Computer Science, 2022, 49(4): 43-48.
[9]	CAO He-xin, ZHAO Liang, LI Xue-feng. Technical Research of Graph Neural Network for Text-to-SQL Parsing [J]. Computer Science, 2022, 49(4): 110-115.
[10]	ZHANG Ji-kai, LI Qi, WANG Yue-ming, LYU Xiao-qi. Survey of 3D Gesture Tracking Algorithms Based on Monocular RGB Images [J]. Computer Science, 2022, 49(4): 174-187.
[11]	MIAO Xu-peng, ZHOU Yue, SHAO Ying-xia, CUI Bin. GSO:A GNN-based Deep Learning Computation Graph Substitutions Optimization Framework [J]. Computer Science, 2022, 49(3): 86-91.
[12]	XIE Yu, YANG Rui-ling, LIU Gong-xu, LI De-yu, WANG Wen-jian. Human Skeleton Action Recognition Algorithm Based on Dynamic Topological Graph [J]. Computer Science, 2022, 49(2): 62-68.
[13]	TAN Xin-yue, HE Xiao-hai, WANG Zheng-yong, LUO Xiao-dong, QING Lin-bo. Text-to-Image Generation Technology Based on Transformer Cross Attention [J]. Computer Science, 2022, 49(2): 107-115.
[14]	SUN Kai-wei, LIU Song, DU Yu-lu. Movie Recommendation Model Based on Attribute Graph Attention Network [J]. Computer Science, 2022, 49(11A): 211100106-8.
[15]	GU Xi-long, GONG Ning-sheng, HU Qian-sheng. Multi-label Vehicle Real-time Recognition Algorithm Based on YOLOv3 and Improved VGGNet [J]. Computer Science, 2022, 49(11A): 210600142-7.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Human-Object Interaction Recognition Integrating Multi-level Visual Features

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0