计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 225-232.doi: 10.11896/jsjkx.201100185

• 计算机图形学&多媒体 • 上一篇    下一篇

基于场景先验知识的室内人体行为识别方法

刘昕1, 袁家斌1,2, 王天星1   

  1. 1 南京航空航天大学计算机科学与技术学院 南京211106
    2 南京航空航天大学信息化处(信息化技术中心) 南京211106
  • 收稿日期:2020-11-26 修回日期:2021-04-01 出版日期:2022-01-15 发布日期:2022-01-18
  • 通讯作者: 袁家斌(jbyuan@nuaa.edu.cn)
  • 作者简介:liuxinx@nuaa.edu.cn
  • 基金资助:
    国家重点研发计划课题(2017YFB0802303);国家自然科学基金(62076127,61571226)

Interior Human Action Recognition Method Based on Prior Knowledge of Scene

LIU Xin1, YUAN Jia-bin1,2, WANG Tian-xing1   

  1. 1 School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
    2 Information Department(Informationization Technology Center),Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Received:2020-11-26 Revised:2021-04-01 Online:2022-01-15 Published:2022-01-18
  • About author:LIU Xin,born in 1995,postgraduate.His main research interests include deep learning and action recognition.
    YUAN Jia-bin,born in 1968,Ph.D,professor,is a senior member of China Computer Federation.His main research interests include deep learning,high performance computing and information security,etc.
  • Supported by:
    National Natural Science Foundation of China(61876121),Key Research and Development Program of Jiangsu Province(BE2017663),Foundation of Natural Science Research Program in Jiangsu Province Higher Education(19KJB520054) and Graduate Student Practice Innovation Projects in Jiangsu Province(SJCX20_1119).

摘要: 目前,室内人体行为识别技术被广泛应用于视频内容理解、居家养老、医疗护理等领域,现有研究方法更多的是对人体行为进行建模,忽略了视频中场景与人体行为间的联系。为了充分利用场景信息与室内人体运动的关联性,文中对基于场景先验知识的室内人体行为识别方法进行了研究,提出了一种基于场景先验知识的双流膨胀3D行为识别网络(Scene-Prior Know-ledge Inflated 3D ConvNet,SPI3D)。首先使用ResNet152网络提取场景特征进行场景分类,再基于场景分类的结果,引入量化后的场景先验知识,通过对权值进行约束来优化总体目标函数。另外,针对现有数据集多聚焦于人体行为特征、场景复杂且场景特征不明显的问题,自建了室内场景-行为识别数据集(Scene-Action DataBase,SADB)。实验结果表明,在SADB数据集上,SPI3D网络的识别准确率为87.9%,比直接利用I3D网络的识别准确率高6%。由此可见,引入场景先验知识后的室内人体行为识别模型具有更好的表现。

关键词: 场景识别, 深度学习, 先验知识, 行为识别

Abstract: Currently,the recognition technology targeted at human action in an interior scene is widely used in video content understanding,home-based care,medical care and other fields,and existing researches pay more heed to the modelling of human action,while ignoring the connection between interior scene and human action in videos.With a view to making full use of the relevance between the scene information and the human motion,this paper studies the recognition approaches for human action in an interior scene based on scene-prior knowledge.Yet,the paper proposes scene-prior knowledge inflated 3D ConvNet(SPI3D).Firstly,the ResNet152 network is adopted to extract scene features for scene classification.Then,based on the results,combined with scene-prior knowledge,this paper introduces quantified scene prior knowledge,optimizes the overall objective function by constraining the weights.Additionally,aiming at the problem that most of the existing data sets focus on the characteristics of human action,whereas the scene information remains complex and plain,an interior scene-action database(SADB) is established.It is shown in experimental results,on the SADB,the recognition accuracy rate of SPI3D reaches 87.9%,6% higher than the recognition accuracy of I3D directly.It can be seen that the modelling for the recognition on human action in interior scene is featured by better performance after introducing the prior knowledge of the scene.

Key words: Action Recognition, Deep learning, Prior knowledge, Scene recognition

中图分类号: 

  • TP391
[1]KAY W,CARREIRA J,SIMONYAN K,et al.The kinetics human action video dataset[J].arXiv:1705.06950,2017.
[2]KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:ALarge Video Database for Human Motion Recognition[C]//2011 International Conference on Computer Vision.Barcelona:IEEE,2011:2556-2563.
[3]SOOMRO K,ZAMIR A R,SHAH M.UCF101:A Dataset of 101 Human Actions Classes From Videos in The Wild[J].ar-Xiv:1212.0402,2012.
[4]SIMONYAN K,ZISSERMAN A.Two-Stream ConvolutionalNetworks for Action Recognition in Videos[M].Advances in Neural Information Processing Systems.Berlin:Springer,2014:568-576.
[5]WANG L,XIONG Y,WANG Z,et al.Temporal Segment Networks:Towards Good Practices for Deep Action Recognition[C]//European Conference on Computer Vision.Cham:Sprin-ger,2016:20-36.
[6]JI S,XU W,YANG M,et al.3D Convolutional Neural Networks for Human Action Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2013,35(1):221-231.
[7]TRAN D,BOURDEV L,FERGUS R,et al.Learning Spatiotemporal Features with 3D Convolutional Networks[C]//2015 IEEE International Conference on Computer Vision(ICCV).Santiago:IEEE,2015:4489-4497.
[8]QIU Z,YAO T,MEI T.Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks[C]//2017 IEEE International Conference on Computer Vision(ICCV).Venice:IEEE,2017:5534-5542.
[9]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas:IEEE,2016:770-778.
[10]CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu:IEEE,2017:6299-6308.
[11]KIM J H,WON C S.Action Recognition in Videos Using Pre-trained 2D Convolutional Neural Networks[J].IEEE Access,2020,8:60179-60188.
[12]YANG W B,YANG H C,LU C,et al.Gesture RecognitionBased on Skin Color Features and Convolutional Neural Network[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2018,35(4):75-81.
[13]YAN H,LUO C,LI H,et al.Gait Recognition Method Based on Gait Energy Map Combined with VGG[J].Journal of Chongqing University of Technology(Natural Science),2020,34(5):166-172.
[14]MARSZALEK M,LAPTEV I,SCHMID C.Actions in context[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.Miami:IEEE,2009:2929-2936.
[15]ZHANG H B,LEI Q,CHEN D S,et al.Probability-based me-thod for boosting human action recognition using scene context[J].IET Computer Vision,2016,10(6):528-536.
[16]DONG X,TAN L,ZHOU L N,et al.Short Video Behavior Re-cognition Combining Scene and Behavior Features[J].Journal of Frontiers of Computer Science and Technology,2020,14(10):1754-1761.
[17]MONTEIRO J,GRANADA R,MENEGUZZI F,et al.UsingScene Context to Improve Action Recognition[C]//23rd Iberoamerican Congress(CIARP 2018).Madrid,2018:954-961.
[18]VU T H,OLSSON C,LAPTEV I,et al.Predicting actions from static scenes[C]//European Conference on Computer Vision.Cham:Springer,2014:421-436.
[19]PENG B,LEI J,FU H,et al.Unsupervised Video Action Clustering via Motion-Scene Interaction Constraint[J].IEEE Transactions on Circuits and Systems for Video Technology,2018,30(1):131-144.
[20]PARK J,LEE J,JEON S,et al.Video Summarization by Lear-ning Relationships between Action and Scene[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop(ICCVW).Seoul:IEEE,2019:1545-1552.
[21]DING X,LUO Y,LI Q,et al.Prior knowledge-based deep lear-ning method for indoor object recognition and application[J].Systems Science & Control Engineering,2018,6(1):249-257.
[22]ZHOU B,GARCIA A L,XIAO J,et al.Learning Deep Features for Scene Recognition using Places Database[J].Advances in Neural Information Processing Systems,2015,1:487-495.
[23]ZHOU B,LAPEDRIZA A,KHOSLA A,et al.Places:A 10 Million Image Database for Scene Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2018,40(6):1452-1464.
[24]DILIGENTI M,ROYCHOWDHURY S,GORI M.IntegratingPrior Knowledge into Deep Learning[C]//IEEE International Conference on Machine Learning & Applications.IEEE,2017:920-923.
[25]XUAN D M,WANG J Y,YU H,et al.Application of priorknowledge in deep learning[J].Computer Engineering and Design,2015,36(11):3087-3091.
[26]SZEGEDY C,LIU W,JIA Y Q,et al.Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Boston:IEEE,2015:1-9.
[27]DENG J,DONG W,SOCHER R,et al.ImageNet:A large-scale hierarchical image database[C]//IEEE Conference on Computer Vision & Pattern Recognition.Miami:IEEE,2009:248-255.
[28]STEWART R,ERMON S.Label-Free Supervision of NeuralNetworks with Physics and Domain Knowledge[C]//Procee-dings of the Thirty-First AAAI Conference on Artificial Intelligence.California:AAAI,2017:2576-2582.
[29]SCHLOSSER P,DAVID M,ARENS M.Investigation on Combining 3D Convolution of Image Data and Optical Flow to Ge-nerate Temporal Action Proposals[C]//2019 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition Workshops(CVPRW).Long Beach:IEEE,2019:2448-2456.
[30]YANG C,XU Y,SHI J,et al.Temporal Pyramid Network for Action Recognition[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Seattle:IEEE,2020:588-597.
[1] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[3] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[4] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[5] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[6] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[7] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[8] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[9] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[10] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[11] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[12] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[13] 祝文韬, 兰先超, 罗唤霖, 岳彬, 汪洋.
改进Faster R-CNN的光学遥感飞机目标检测
Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN
计算机科学, 2022, 49(6A): 378-383. https://doi.org/10.11896/jsjkx.210300121
[14] 肖治鸿, 韩晔彤, 邹永攀.
基于多源数据和逻辑推理的行为识别技术研究
Study on Activity Recognition Based on Multi-source Data and Logical Reasoning
计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270
[15] 王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤.
不同数据增强方法对模型识别精度的影响
Influence of Different Data Augmentation Methods on Model Recognition Accuracy
计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!