计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 225-232.doi: 10.11896/jsjkx.201100185
刘昕1, 袁家斌1,2, 王天星1
LIU Xin1, YUAN Jia-bin1,2, WANG Tian-xing1
摘要: 目前,室内人体行为识别技术被广泛应用于视频内容理解、居家养老、医疗护理等领域,现有研究方法更多的是对人体行为进行建模,忽略了视频中场景与人体行为间的联系。为了充分利用场景信息与室内人体运动的关联性,文中对基于场景先验知识的室内人体行为识别方法进行了研究,提出了一种基于场景先验知识的双流膨胀3D行为识别网络(Scene-Prior Know-ledge Inflated 3D ConvNet,SPI3D)。首先使用ResNet152网络提取场景特征进行场景分类,再基于场景分类的结果,引入量化后的场景先验知识,通过对权值进行约束来优化总体目标函数。另外,针对现有数据集多聚焦于人体行为特征、场景复杂且场景特征不明显的问题,自建了室内场景-行为识别数据集(Scene-Action DataBase,SADB)。实验结果表明,在SADB数据集上,SPI3D网络的识别准确率为87.9%,比直接利用I3D网络的识别准确率高6%。由此可见,引入场景先验知识后的室内人体行为识别模型具有更好的表现。
中图分类号:
[1]KAY W,CARREIRA J,SIMONYAN K,et al.The kinetics human action video dataset[J].arXiv:1705.06950,2017. [2]KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:ALarge Video Database for Human Motion Recognition[C]//2011 International Conference on Computer Vision.Barcelona:IEEE,2011:2556-2563. [3]SOOMRO K,ZAMIR A R,SHAH M.UCF101:A Dataset of 101 Human Actions Classes From Videos in The Wild[J].ar-Xiv:1212.0402,2012. [4]SIMONYAN K,ZISSERMAN A.Two-Stream ConvolutionalNetworks for Action Recognition in Videos[M].Advances in Neural Information Processing Systems.Berlin:Springer,2014:568-576. [5]WANG L,XIONG Y,WANG Z,et al.Temporal Segment Networks:Towards Good Practices for Deep Action Recognition[C]//European Conference on Computer Vision.Cham:Sprin-ger,2016:20-36. [6]JI S,XU W,YANG M,et al.3D Convolutional Neural Networks for Human Action Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2013,35(1):221-231. [7]TRAN D,BOURDEV L,FERGUS R,et al.Learning Spatiotemporal Features with 3D Convolutional Networks[C]//2015 IEEE International Conference on Computer Vision(ICCV).Santiago:IEEE,2015:4489-4497. [8]QIU Z,YAO T,MEI T.Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks[C]//2017 IEEE International Conference on Computer Vision(ICCV).Venice:IEEE,2017:5534-5542. [9]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas:IEEE,2016:770-778. [10]CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu:IEEE,2017:6299-6308. [11]KIM J H,WON C S.Action Recognition in Videos Using Pre-trained 2D Convolutional Neural Networks[J].IEEE Access,2020,8:60179-60188. [12]YANG W B,YANG H C,LU C,et al.Gesture RecognitionBased on Skin Color Features and Convolutional Neural Network[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2018,35(4):75-81. [13]YAN H,LUO C,LI H,et al.Gait Recognition Method Based on Gait Energy Map Combined with VGG[J].Journal of Chongqing University of Technology(Natural Science),2020,34(5):166-172. [14]MARSZALEK M,LAPTEV I,SCHMID C.Actions in context[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.Miami:IEEE,2009:2929-2936. [15]ZHANG H B,LEI Q,CHEN D S,et al.Probability-based me-thod for boosting human action recognition using scene context[J].IET Computer Vision,2016,10(6):528-536. [16]DONG X,TAN L,ZHOU L N,et al.Short Video Behavior Re-cognition Combining Scene and Behavior Features[J].Journal of Frontiers of Computer Science and Technology,2020,14(10):1754-1761. [17]MONTEIRO J,GRANADA R,MENEGUZZI F,et al.UsingScene Context to Improve Action Recognition[C]//23rd Iberoamerican Congress(CIARP 2018).Madrid,2018:954-961. [18]VU T H,OLSSON C,LAPTEV I,et al.Predicting actions from static scenes[C]//European Conference on Computer Vision.Cham:Springer,2014:421-436. [19]PENG B,LEI J,FU H,et al.Unsupervised Video Action Clustering via Motion-Scene Interaction Constraint[J].IEEE Transactions on Circuits and Systems for Video Technology,2018,30(1):131-144. [20]PARK J,LEE J,JEON S,et al.Video Summarization by Lear-ning Relationships between Action and Scene[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop(ICCVW).Seoul:IEEE,2019:1545-1552. [21]DING X,LUO Y,LI Q,et al.Prior knowledge-based deep lear-ning method for indoor object recognition and application[J].Systems Science & Control Engineering,2018,6(1):249-257. [22]ZHOU B,GARCIA A L,XIAO J,et al.Learning Deep Features for Scene Recognition using Places Database[J].Advances in Neural Information Processing Systems,2015,1:487-495. [23]ZHOU B,LAPEDRIZA A,KHOSLA A,et al.Places:A 10 Million Image Database for Scene Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2018,40(6):1452-1464. [24]DILIGENTI M,ROYCHOWDHURY S,GORI M.IntegratingPrior Knowledge into Deep Learning[C]//IEEE International Conference on Machine Learning & Applications.IEEE,2017:920-923. [25]XUAN D M,WANG J Y,YU H,et al.Application of priorknowledge in deep learning[J].Computer Engineering and Design,2015,36(11):3087-3091. [26]SZEGEDY C,LIU W,JIA Y Q,et al.Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Boston:IEEE,2015:1-9. [27]DENG J,DONG W,SOCHER R,et al.ImageNet:A large-scale hierarchical image database[C]//IEEE Conference on Computer Vision & Pattern Recognition.Miami:IEEE,2009:248-255. [28]STEWART R,ERMON S.Label-Free Supervision of NeuralNetworks with Physics and Domain Knowledge[C]//Procee-dings of the Thirty-First AAAI Conference on Artificial Intelligence.California:AAAI,2017:2576-2582. [29]SCHLOSSER P,DAVID M,ARENS M.Investigation on Combining 3D Convolution of Image Data and Optical Flow to Ge-nerate Temporal Action Proposals[C]//2019 IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition Workshops(CVPRW).Long Beach:IEEE,2019:2448-2456. [30]YANG C,XU Y,SHI J,et al.Temporal Pyramid Network for Action Recognition[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Seattle:IEEE,2020:588-597. |
[1] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[2] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[3] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[4] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[5] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[6] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[7] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[8] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[9] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[10] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[11] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[12] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[13] | 祝文韬, 兰先超, 罗唤霖, 岳彬, 汪洋. 改进Faster R-CNN的光学遥感飞机目标检测 Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN 计算机科学, 2022, 49(6A): 378-383. https://doi.org/10.11896/jsjkx.210300121 |
[14] | 肖治鸿, 韩晔彤, 邹永攀. 基于多源数据和逻辑推理的行为识别技术研究 Study on Activity Recognition Based on Multi-source Data and Logical Reasoning 计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270 |
[15] | 王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤. 不同数据增强方法对模型识别精度的影响 Influence of Different Data Augmentation Methods on Model Recognition Accuracy 计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210 |
|