Computer Science ›› 2022, Vol. 49 ›› Issue (1): 225-232.doi: 10.11896/jsjkx.201100185

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Interior Human Action Recognition Method Based on Prior Knowledge of Scene

LIU Xin1, YUAN Jia-bin1,2, WANG Tian-xing1   

  1. 1 School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
    2 Information Department(Informationization Technology Center),Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Received:2020-11-26 Revised:2021-04-01 Online:2022-01-15 Published:2022-01-18
  • About author:LIU Xin,born in 1995,postgraduate.His main research interests include deep learning and action recognition.
    YUAN Jia-bin,born in 1968,Ph.D,professor,is a senior member of China Computer Federation.His main research interests include deep learning,high performance computing and information security,etc.
  • Supported by:
    National Natural Science Foundation of China(61876121),Key Research and Development Program of Jiangsu Province(BE2017663),Foundation of Natural Science Research Program in Jiangsu Province Higher Education(19KJB520054) and Graduate Student Practice Innovation Projects in Jiangsu Province(SJCX20_1119).

Abstract: Currently,the recognition technology targeted at human action in an interior scene is widely used in video content understanding,home-based care,medical care and other fields,and existing researches pay more heed to the modelling of human action,while ignoring the connection between interior scene and human action in videos.With a view to making full use of the relevance between the scene information and the human motion,this paper studies the recognition approaches for human action in an interior scene based on scene-prior knowledge.Yet,the paper proposes scene-prior knowledge inflated 3D ConvNet(SPI3D).Firstly,the ResNet152 network is adopted to extract scene features for scene classification.Then,based on the results,combined with scene-prior knowledge,this paper introduces quantified scene prior knowledge,optimizes the overall objective function by constraining the weights.Additionally,aiming at the problem that most of the existing data sets focus on the characteristics of human action,whereas the scene information remains complex and plain,an interior scene-action database(SADB) is established.It is shown in experimental results,on the SADB,the recognition accuracy rate of SPI3D reaches 87.9%,6% higher than the recognition accuracy of I3D directly.It can be seen that the modelling for the recognition on human action in interior scene is featured by better performance after introducing the prior knowledge of the scene.

Key words: Action Recognition, Deep learning, Prior knowledge, Scene recognition

CLC Number: 

  • TP391
[1]KAY W,CARREIRA J,SIMONYAN K,et al.The kinetics human action video dataset[J].arXiv:1705.06950,2017.
[2]KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:ALarge Video Database for Human Motion Recognition[C]//2011 International Conference on Computer Vision.Barcelona:IEEE,2011:2556-2563.
[3]SOOMRO K,ZAMIR A R,SHAH M.UCF101:A Dataset of 101 Human Actions Classes From Videos in The Wild[J].ar-Xiv:1212.0402,2012.
[4]SIMONYAN K,ZISSERMAN A.Two-Stream ConvolutionalNetworks for Action Recognition in Videos[M].Advances in Neural Information Processing Systems.Berlin:Springer,2014:568-576.
[5]WANG L,XIONG Y,WANG Z,et al.Temporal Segment Networks:Towards Good Practices for Deep Action Recognition[C]//European Conference on Computer Vision.Cham:Sprin-ger,2016:20-36.
[6]JI S,XU W,YANG M,et al.3D Convolutional Neural Networks for Human Action Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2013,35(1):221-231.
[7]TRAN D,BOURDEV L,FERGUS R,et al.Learning Spatiotemporal Features with 3D Convolutional Networks[C]//2015 IEEE International Conference on Computer Vision(ICCV).Santiago:IEEE,2015:4489-4497.
[8]QIU Z,YAO T,MEI T.Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks[C]//2017 IEEE International Conference on Computer Vision(ICCV).Venice:IEEE,2017:5534-5542.
[9]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas:IEEE,2016:770-778.
[10]CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu:IEEE,2017:6299-6308.
[11]KIM J H,WON C S.Action Recognition in Videos Using Pre-trained 2D Convolutional Neural Networks[J].IEEE Access,2020,8:60179-60188.
[12]YANG W B,YANG H C,LU C,et al.Gesture RecognitionBased on Skin Color Features and Convolutional Neural Network[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2018,35(4):75-81.
[13]YAN H,LUO C,LI H,et al.Gait Recognition Method Based on Gait Energy Map Combined with VGG[J].Journal of Chongqing University of Technology(Natural Science),2020,34(5):166-172.
[14]MARSZALEK M,LAPTEV I,SCHMID C.Actions in context[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.Miami:IEEE,2009:2929-2936.
[15]ZHANG H B,LEI Q,CHEN D S,et al.Probability-based me-thod for boosting human action recognition using scene context[J].IET Computer Vision,2016,10(6):528-536.
[16]DONG X,TAN L,ZHOU L N,et al.Short Video Behavior Re-cognition Combining Scene and Behavior Features[J].Journal of Frontiers of Computer Science and Technology,2020,14(10):1754-1761.
[17]MONTEIRO J,GRANADA R,MENEGUZZI F,et al.UsingScene Context to Improve Action Recognition[C]//23rd Iberoamerican Congress(CIARP 2018).Madrid,2018:954-961.
[18]VU T H,OLSSON C,LAPTEV I,et al.Predicting actions from static scenes[C]//European Conference on Computer Vision.Cham:Springer,2014:421-436.
[19]PENG B,LEI J,FU H,et al.Unsupervised Video Action Clustering via Motion-Scene Interaction Constraint[J].IEEE Transactions on Circuits and Systems for Video Technology,2018,30(1):131-144.
[20]PARK J,LEE J,JEON S,et al.Video Summarization by Lear-ning Relationships between Action and Scene[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop(ICCVW).Seoul:IEEE,2019:1545-1552.
[21]DING X,LUO Y,LI Q,et al.Prior knowledge-based deep lear-ning method for indoor object recognition and application[J].Systems Science & Control Engineering,2018,6(1):249-257.
[22]ZHOU B,GARCIA A L,XIAO J,et al.Learning Deep Features for Scene Recognition using Places Database[J].Advances in Neural Information Processing Systems,2015,1:487-495.
[23]ZHOU B,LAPEDRIZA A,KHOSLA A,et al.Places:A 10 Million Image Database for Scene Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2018,40(6):1452-1464.
[24]DILIGENTI M,ROYCHOWDHURY S,GORI M.IntegratingPrior Knowledge into Deep Learning[C]//IEEE International Conference on Machine Learning & Applications.IEEE,2017:920-923.
[25]XUAN D M,WANG J Y,YU H,et al.Application of priorknowledge in deep learning[J].Computer Engineering and Design,2015,36(11):3087-3091.
[26]SZEGEDY C,LIU W,JIA Y Q,et al.Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Boston:IEEE,2015:1-9.
[27]DENG J,DONG W,SOCHER R,et al.ImageNet:A large-scale hierarchical image database[C]//IEEE Conference on Computer Vision & Pattern Recognition.Miami:IEEE,2009:248-255.
[28]STEWART R,ERMON S.Label-Free Supervision of NeuralNetworks with Physics and Domain Knowledge[C]//Procee-dings of the Thirty-First AAAI Conference on Artificial Intelligence.California:AAAI,2017:2576-2582.
[29]SCHLOSSER P,DAVID M,ARENS M.Investigation on Combining 3D Convolution of Image Data and Optical Flow to Ge-nerate Temporal Action Proposals[C]//2019 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition Workshops(CVPRW).Long Beach:IEEE,2019:2448-2456.
[30]YANG C,XU Y,SHI J,et al.Temporal Pyramid Network for Action Recognition[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Seattle:IEEE,2020:588-597.
[1] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[3] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[4] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[5] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[6] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[7] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[8] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[9] ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[10] SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.
[11] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[12] CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[13] ZHU Wen-tao, LAN Xian-chao, LUO Huan-lin, YUE Bing, WANG Yang. Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN [J]. Computer Science, 2022, 49(6A): 378-383.
[14] WANG Jian-ming, CHEN Xiang-yu, YANG Zi-zhong, SHI Chen-yang, ZHANG Yu-hang, QIAN Zheng-kun. Influence of Different Data Augmentation Methods on Model Recognition Accuracy [J]. Computer Science, 2022, 49(6A): 418-423.
[15] MAO Dian-hui, HUANG Hui-yu, ZHAO Shuang. Study on Automatic Synthetic News Detection Method Complying with Regulatory Compliance [J]. Computer Science, 2022, 49(6A): 523-530.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!