计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 225-232.doi: 10.11896/jsjkx.201100185

• 计算机图形学&多媒体 • 上一篇    下一篇


刘昕1, 袁家斌1,2, 王天星1   

  1. 1 南京航空航天大学计算机科学与技术学院 南京211106
    2 南京航空航天大学信息化处(信息化技术中心) 南京211106
  • 收稿日期:2020-11-26 修回日期:2021-04-01 出版日期:2022-01-15 发布日期:2022-01-18
  • 通讯作者: 袁家斌(jbyuan@nuaa.edu.cn)
  • 作者简介:liuxinx@nuaa.edu.cn
  • 基金资助:

Interior Human Action Recognition Method Based on Prior Knowledge of Scene

LIU Xin1, YUAN Jia-bin1,2, WANG Tian-xing1   

  1. 1 School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
    2 Information Department(Informationization Technology Center),Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Received:2020-11-26 Revised:2021-04-01 Online:2022-01-15 Published:2022-01-18
  • About author:LIU Xin,born in 1995,postgraduate.His main research interests include deep learning and action recognition.
    YUAN Jia-bin,born in 1968,Ph.D,professor,is a senior member of China Computer Federation.His main research interests include deep learning,high performance computing and information security,etc.
  • Supported by:
    National Natural Science Foundation of China(61876121),Key Research and Development Program of Jiangsu Province(BE2017663),Foundation of Natural Science Research Program in Jiangsu Province Higher Education(19KJB520054) and Graduate Student Practice Innovation Projects in Jiangsu Province(SJCX20_1119).

摘要: 目前,室内人体行为识别技术被广泛应用于视频内容理解、居家养老、医疗护理等领域,现有研究方法更多的是对人体行为进行建模,忽略了视频中场景与人体行为间的联系。为了充分利用场景信息与室内人体运动的关联性,文中对基于场景先验知识的室内人体行为识别方法进行了研究,提出了一种基于场景先验知识的双流膨胀3D行为识别网络(Scene-Prior Know-ledge Inflated 3D ConvNet,SPI3D)。首先使用ResNet152网络提取场景特征进行场景分类,再基于场景分类的结果,引入量化后的场景先验知识,通过对权值进行约束来优化总体目标函数。另外,针对现有数据集多聚焦于人体行为特征、场景复杂且场景特征不明显的问题,自建了室内场景-行为识别数据集(Scene-Action DataBase,SADB)。实验结果表明,在SADB数据集上,SPI3D网络的识别准确率为87.9%,比直接利用I3D网络的识别准确率高6%。由此可见,引入场景先验知识后的室内人体行为识别模型具有更好的表现。

关键词: 场景识别, 深度学习, 先验知识, 行为识别

Abstract: Currently,the recognition technology targeted at human action in an interior scene is widely used in video content understanding,home-based care,medical care and other fields,and existing researches pay more heed to the modelling of human action,while ignoring the connection between interior scene and human action in videos.With a view to making full use of the relevance between the scene information and the human motion,this paper studies the recognition approaches for human action in an interior scene based on scene-prior knowledge.Yet,the paper proposes scene-prior knowledge inflated 3D ConvNet(SPI3D).Firstly,the ResNet152 network is adopted to extract scene features for scene classification.Then,based on the results,combined with scene-prior knowledge,this paper introduces quantified scene prior knowledge,optimizes the overall objective function by constraining the weights.Additionally,aiming at the problem that most of the existing data sets focus on the characteristics of human action,whereas the scene information remains complex and plain,an interior scene-action database(SADB) is established.It is shown in experimental results,on the SADB,the recognition accuracy rate of SPI3D reaches 87.9%,6% higher than the recognition accuracy of I3D directly.It can be seen that the modelling for the recognition on human action in interior scene is featured by better performance after introducing the prior knowledge of the scene.

Key words: Scene recognition, Deep learning, Prior knowledge, Action Recognition


  • TP391
[1]KAY W,CARREIRA J,SIMONYAN K,et al.The kinetics human action video dataset[J].arXiv:1705.06950,2017.
[2]KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:ALarge Video Database for Human Motion Recognition[C]//2011 International Conference on Computer Vision.Barcelona:IEEE,2011:2556-2563.
[3]SOOMRO K,ZAMIR A R,SHAH M.UCF101:A Dataset of 101 Human Actions Classes From Videos in The Wild[J].ar-Xiv:1212.0402,2012.
[4]SIMONYAN K,ZISSERMAN A.Two-Stream ConvolutionalNetworks for Action Recognition in Videos[M].Advances in Neural Information Processing Systems.Berlin:Springer,2014:568-576.
[5]WANG L,XIONG Y,WANG Z,et al.Temporal Segment Networks:Towards Good Practices for Deep Action Recognition[C]//European Conference on Computer Vision.Cham:Sprin-ger,2016:20-36.
[6]JI S,XU W,YANG M,et al.3D Convolutional Neural Networks for Human Action Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2013,35(1):221-231.
[7]TRAN D,BOURDEV L,FERGUS R,et al.Learning Spatiotemporal Features with 3D Convolutional Networks[C]//2015 IEEE International Conference on Computer Vision(ICCV).Santiago:IEEE,2015:4489-4497.
[8]QIU Z,YAO T,MEI T.Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks[C]//2017 IEEE International Conference on Computer Vision(ICCV).Venice:IEEE,2017:5534-5542.
[9]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas:IEEE,2016:770-778.
[10]CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu:IEEE,2017:6299-6308.
[11]KIM J H,WON C S.Action Recognition in Videos Using Pre-trained 2D Convolutional Neural Networks[J].IEEE Access,2020,8:60179-60188.
[12]YANG W B,YANG H C,LU C,et al.Gesture RecognitionBased on Skin Color Features and Convolutional Neural Network[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2018,35(4):75-81.
[13]YAN H,LUO C,LI H,et al.Gait Recognition Method Based on Gait Energy Map Combined with VGG[J].Journal of Chongqing University of Technology(Natural Science),2020,34(5):166-172.
[14]MARSZALEK M,LAPTEV I,SCHMID C.Actions in context[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.Miami:IEEE,2009:2929-2936.
[15]ZHANG H B,LEI Q,CHEN D S,et al.Probability-based me-thod for boosting human action recognition using scene context[J].IET Computer Vision,2016,10(6):528-536.
[16]DONG X,TAN L,ZHOU L N,et al.Short Video Behavior Re-cognition Combining Scene and Behavior Features[J].Journal of Frontiers of Computer Science and Technology,2020,14(10):1754-1761.
[17]MONTEIRO J,GRANADA R,MENEGUZZI F,et al.UsingScene Context to Improve Action Recognition[C]//23rd Iberoamerican Congress(CIARP 2018).Madrid,2018:954-961.
[18]VU T H,OLSSON C,LAPTEV I,et al.Predicting actions from static scenes[C]//European Conference on Computer Vision.Cham:Springer,2014:421-436.
[19]PENG B,LEI J,FU H,et al.Unsupervised Video Action Clustering via Motion-Scene Interaction Constraint[J].IEEE Transactions on Circuits and Systems for Video Technology,2018,30(1):131-144.
[20]PARK J,LEE J,JEON S,et al.Video Summarization by Lear-ning Relationships between Action and Scene[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop(ICCVW).Seoul:IEEE,2019:1545-1552.
[21]DING X,LUO Y,LI Q,et al.Prior knowledge-based deep lear-ning method for indoor object recognition and application[J].Systems Science & Control Engineering,2018,6(1):249-257.
[22]ZHOU B,GARCIA A L,XIAO J,et al.Learning Deep Features for Scene Recognition using Places Database[J].Advances in Neural Information Processing Systems,2015,1:487-495.
[23]ZHOU B,LAPEDRIZA A,KHOSLA A,et al.Places:A 10 Million Image Database for Scene Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2018,40(6):1452-1464.
[24]DILIGENTI M,ROYCHOWDHURY S,GORI M.IntegratingPrior Knowledge into Deep Learning[C]//IEEE International Conference on Machine Learning & Applications.IEEE,2017:920-923.
[25]XUAN D M,WANG J Y,YU H,et al.Application of priorknowledge in deep learning[J].Computer Engineering and Design,2015,36(11):3087-3091.
[26]SZEGEDY C,LIU W,JIA Y Q,et al.Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Boston:IEEE,2015:1-9.
[27]DENG J,DONG W,SOCHER R,et al.ImageNet:A large-scale hierarchical image database[C]//IEEE Conference on Computer Vision & Pattern Recognition.Miami:IEEE,2009:248-255.
[28]STEWART R,ERMON S.Label-Free Supervision of NeuralNetworks with Physics and Domain Knowledge[C]//Procee-dings of the Thirty-First AAAI Conference on Artificial Intelligence.California:AAAI,2017:2576-2582.
[29]SCHLOSSER P,DAVID M,ARENS M.Investigation on Combining 3D Convolution of Image Data and Optical Flow to Ge-nerate Temporal Action Proposals[C]//2019 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition Workshops(CVPRW).Long Beach:IEEE,2019:2448-2456.
[30]YANG C,XU Y,SHI J,et al.Temporal Pyramid Network for Action Recognition[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Seattle:IEEE,2020:588-597.
[1] 蒋宗礼, 樊珂, 张津丽. 基于生成对抗网络和元路径的异质网络表示学习[J]. 计算机科学, 2022, 49(1): 133-139.
[2] 肖丁, 张玙璠, 纪厚业. 基于多头注意力机制的用户窃电行为检测[J]. 计算机科学, 2022, 49(1): 140-145.
[3] 祝一帆, 王海涛, 李可, 吴贺俊. 一种高精度路面裂缝检测网络结构:Crack U-Net[J]. 计算机科学, 2022, 49(1): 204-211.
[4] 方仲礼, 王喆, 迟子秋. 面向多标签小样本学习的双流重构网络[J]. 计算机科学, 2022, 49(1): 212-218.
[5] 牛富生, 郭延哺, 李维华, 刘文洋. 基于序列特征融合的蛋白质可溶性预测[J]. 计算机科学, 2022, 49(1): 285-291.
[6] 董晓梅, 王蕊, 邹欣开. 面向推荐应用的差分隐私方案综述[J]. 计算机科学, 2021, 48(9): 21-35.
[7] 周新民, 胡宜桂, 刘文洁, 孙荣俊. 基于多模态多层级数据融合方法的城市功能识别研究[J]. 计算机科学, 2021, 48(9): 50-58.
[8] 钱梦薇, 过弋. 融合偏置深度学习的距离分解Top-N推荐算法[J]. 计算机科学, 2021, 48(9): 103-109.
[9] 徐涛, 田崇阳, 刘才华. 基于深度学习的人群异常行为检测综述[J]. 计算机科学, 2021, 48(9): 125-134.
[10] 张新峰, 宋博. 一种基于改进三元组损失和特征融合的行人重识别方法[J]. 计算机科学, 2021, 48(9): 146-152.
[11] 林椹尠, 张梦凯, 吴成茂, 郑兴宁. 利用生成对抗网络的人脸图像分步补全法[J]. 计算机科学, 2021, 48(9): 174-180.
[12] 黄晓生, 徐静. 基于PCANet的非下采样剪切波域多聚焦图像融合[J]. 计算机科学, 2021, 48(9): 181-186.
[13] 田野, 陈宏巍, 王法胜, 陈兴文. 室内移动机器人的SLAM算法综述[J]. 计算机科学, 2021, 48(9): 223-234.
[14] 谢良旭, 李峰, 谢建平, 许晓军. 基于融合神经网络模型的药物分子性质预测[J]. 计算机科学, 2021, 48(9): 251-256.
[15] 冯霞, 胡志毅, 刘才华. 跨模态检索研究进展综述[J]. 计算机科学, 2021, 48(8): 13-23.
Full text



[1] 施超,谢在鹏,柳晗,吕鑫. 基于稳定匹配的容器部署策略的优化[J]. 计算机科学, 2018, 45(4): 131 -136 .
[2] 丁舒阳,黎冰,侍洪波. 基于改进的离散PSO算法的FJSP的研究[J]. 计算机科学, 2018, 45(4): 233 -239 .
[3] 王振武,吕小华,韩晓辉. 基于四叉树分割的地形LOD技术综述[J]. 计算机科学, 2018, 45(4): 34 -45 .
[4] 崔建京,龙军,闵尔学,于洋,殷建平. 同态加密在加密机器学习中的应用研究综述[J]. 计算机科学, 2018, 45(4): 46 -52 .
[5] 张文博,侯晓荣. 基于高斯分布的大气光估计算法[J]. 计算机科学, 2018, 45(4): 301 -305 .
[6] 杨沛安, 武杨, 苏莉娅, 刘宝旭. 网络空间威胁情报共享技术综述[J]. 计算机科学, 2018, 45(6): 9 -18 .
[7] 项英倬, 谭菊仙, 韩杰思, 石浩. 图匹配技术研究[J]. 计算机科学, 2018, 45(6): 27 -31 .
[8] 陈翔, 王秋萍. 基于代码修改的多目标有监督缺陷预测建模方法[J]. 计算机科学, 2018, 45(6): 161 -165 .
[9] 冉正, 罗蕾, 晏华, 李允. 基于纳什均衡的AUTOSAR任务到多核ECU的映射方法[J]. 计算机科学, 2018, 45(6): 166 -171 .
[10] 黄一龙, 李培峰, 朱巧明. 事件因果与时序关系识别的联合推理模型[J]. 计算机科学, 2018, 45(6): 204 -207 .