计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 225-232.doi: 10.11896/jsjkx.201100185

刘昕1, 袁家斌1,2, 王天星1   

  1. 1 南京航空航天大学计算机科学与技术学院 南京211106
    2 南京航空航天大学信息化处(信息化技术中心) 南京211106
  • 收稿日期:2020-11-26 修回日期:2021-04-01 出版日期:2022-01-15 发布日期:2022-01-18
  • 通讯作者: 袁家斌(jbyuan@nuaa.edu.cn)
  • 作者简介:liuxinx@nuaa.edu.cn
  • 基金资助:

Interior Human Action Recognition Method Based on Prior Knowledge of Scene

LIU Xin1, YUAN Jia-bin1,2, WANG Tian-xing1   

  1. 1 School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
    2 Information Department(Informationization Technology Center),Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Received:2020-11-26 Revised:2021-04-01 Online:2022-01-15 Published:2022-01-18
  • About author:LIU Xin,born in 1995,postgraduate.His main research interests include deep learning and action recognition.
    YUAN Jia-bin,born in 1968,Ph.D,professor,is a senior member of China Computer Federation.His main research interests include deep learning,high performance computing and information security,etc.
  • Supported by:
    National Natural Science Foundation of China(61876121),Key Research and Development Program of Jiangsu Province(BE2017663),Foundation of Natural Science Research Program in Jiangsu Province Higher Education(19KJB520054) and Graduate Student Practice Innovation Projects in Jiangsu Province(SJCX20_1119).

摘要: 目前,室内人体行为识别技术被广泛应用于视频内容理解、居家养老、医疗护理等领域,现有研究方法更多的是对人体行为进行建模,忽略了视频中场景与人体行为间的联系。为了充分利用场景信息与室内人体运动的关联性,文中对基于场景先验知识的室内人体行为识别方法进行了研究,提出了一种基于场景先验知识的双流膨胀3D行为识别网络(Scene-Prior Know-ledge Inflated 3D ConvNet,SPI3D)。首先使用ResNet152网络提取场景特征进行场景分类,再基于场景分类的结果,引入量化后的场景先验知识,通过对权值进行约束来优化总体目标函数。另外,针对现有数据集多聚焦于人体行为特征、场景复杂且场景特征不明显的问题,自建了室内场景-行为识别数据集(Scene-Action DataBase,SADB)。实验结果表明,在SADB数据集上,SPI3D网络的识别准确率为87.9%,比直接利用I3D网络的识别准确率高6%。由此可见,引入场景先验知识后的室内人体行为识别模型具有更好的表现。

关键词: 场景识别, 深度学习, 先验知识, 行为识别

Abstract: Currently,the recognition technology targeted at human action in an interior scene is widely used in video content understanding,home-based care,medical care and other fields,and existing researches pay more heed to the modelling of human action,while ignoring the connection between interior scene and human action in videos.With a view to making full use of the relevance between the scene information and the human motion,this paper studies the recognition approaches for human action in an interior scene based on scene-prior knowledge.Yet,the paper proposes scene-prior knowledge inflated 3D ConvNet(SPI3D).Firstly,the ResNet152 network is adopted to extract scene features for scene classification.Then,based on the results,combined with scene-prior knowledge,this paper introduces quantified scene prior knowledge,optimizes the overall objective function by constraining the weights.Additionally,aiming at the problem that most of the existing data sets focus on the characteristics of human action,whereas the scene information remains complex and plain,an interior scene-action database(SADB) is established.It is shown in experimental results,on the SADB,the recognition accuracy rate of SPI3D reaches 87.9%,6% higher than the recognition accuracy of I3D directly.It can be seen that the modelling for the recognition on human action in interior scene is featured by better performance after introducing the prior knowledge of the scene.

Key words: Scene recognition, Deep learning, Prior knowledge, Action Recognition


