计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 232-241.doi: 10.11896/jsjkx.230600143

• 计算机图形学&多媒体 • 上一篇    下一篇

基于多模态注意力网络的红外人体行为识别方法

汪超1, 唐超1, 王文剑2, 张靖3   

  1. 1 合肥学院人工智能与大数据学院 合肥 230601
    2 山西大学计算机与信息技术学院 太原 030006
    3 中国科学技术大学研究生院科学岛分院 合肥 230031
  • 收稿日期:2023-06-16 修回日期:2023-08-24 出版日期:2024-08-15 发布日期:2024-08-13
  • 通讯作者: 唐超(tangchao@hfuu.edu.cn)
  • 作者简介:(1747808376@qq.com)
  • 基金资助:
    国家自然科学基金(62076154, U21A20513);安徽省自然科学基金(2008085MF202);合肥学院科研项目(22050123010);安徽省研究生学术创新项目(2022xscx145);安徽省大学生创新创业训练计划项目(1602582519599861760)

Infrared Human Action Recognition Method Based on Multimodal Attention Network

WANG Chao1, TANG Chao1, WANG Wenjian2, ZHANG Jing3   

  1. 1 School of Artificial Intelligence and Big Data,Hefei University,Hefei 230601,China
    2 School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    3 Science Island Branch of Graduate School,University of Science and Technology of China,Hefei 230031,China
  • Received:2023-06-16 Revised:2023-08-24 Online:2024-08-15 Published:2024-08-13
  • About author:WANG Chao,born in 1998,postgra-duate.His main research interests include machine learning and computer vision.
    TANG Chao,born in 1977,Ph.D,asso-ciate professor,master’s supervisor,is a member of CCF(No.40989M).His main research interests include artificial intelligence,pattern recognition,machine learning,and computer vision.
  • Supported by:
    National Natural Science Foundation of China(62076154,U21A20513),Natural Science Foundation of Anhui Province,China(2008085MF202),Research Projects of Hefei University(22050123010),Anhui Provincial Graduate Academic Innovation Project(2022xscx145) and Anhui Provincial College Student Innovation and Entrepreneurship Training Program Project(1602582519599861760).

摘要: 深度学习网络对红外单一模态数据的学习表征能力具有一定的局限性,针对该问题,文中提出了基于多模态注意力网络的红外人体行为识别方法。由于深度学习网络模型无法直接对视频信息进行训练和分类,首先,通过预处理模块将得到的视频信息预处理成红外视图,再将得到的红外视图通过Sobel算子和基于L1范数的全变分光流法分别提取红外视图的边缘信息和光流信息得到边缘视图和光流视图;其次,将红外视图、边缘视图、光流视图分别输入融合注意力机制模块的三流网络中进行特征学习;然后,对三流网络中每个网络提取的多模态特征进行融合;最后,将融合得到的特征向量输入随机森林进行训练和分类。在公开数据集NTU RGB+D和自建数据集上进行实验,结果表明了所提方法具有不错的识别效果。

关键词: 多模态, 注意力机制, 三流网络, 特征融合, 随机森林

Abstract: Human behavior recognition has become one of the research hotspots in the field of machine vision and pattern recognition,and has important research value.Many intelligent services require rapid and accurate recognition of human behavior.Human behavior recognition has important research significance and wide application value in fields such as intelligent monitoring and smart home,and has been widely studied by scholars at home and abroad.Human behavior recognition usually uses visible light video data,but visible light videos are easily affected by light and cannot adapt to nighttime recognition.Due to the characteristics of infrared information such as being less affected by light and protecting privacy,human behavior recognition methods based on infrared video have great significance.Deep learning network has some limitations on the learning and representation ability of infrared single mode data.To solve the above problems,an infrared human behavior recognition method based on multimodal attention network is proposed.Because the deep learning network model cannot directly train and classify the video information,first,the preprocessing module preprocesses the video information obtained into infrared views,and then extracts the edge information and optical flow information of the infrared view through Sobel operator and L1 norm based total variation optical flow method to obtain the edge view and optical flow view respectively.Secondly,input the infrared view,edge view,and optical flow view into the three stream network fused with the attention mechanism module for feature learning.Then,fuse the multimodal features extracted from each network in the three stream network.Finally,the fusion feature vector is input to random forest for training and classification.Experimental results on the public dataset NTU RGB+D and the self-built dataset indicate that the proposed me-thod has good recognition performance.In the future,we will consider expanding our method to more datasets to verify its effectiveness.

Key words: Multimodal, Attention mechanism, Three stream network, Feature fusion, Random forest

中图分类号: 

  • TP391
[1]HERATH S,HARANDI M,PORIKLI F.Going Deeper intoAction Recognition:A Survey[J].Image and Vision Computing,2017,60:4-21.
[2]PAN L L,CHEN Q K.Abnormal Behavior Detection ModelBased on Multi-sensor Sequence for Eldercare[J].Journal of Chinese Computer Systems,2022,43(9):1984-1991.
[3]GUO W,WANG J,WANG S.Deep Multimodal Representation Learning:A Survey[J].IEEE Access,2019,7:63373-63394.
[4]MAQSOOD M,NAZIR F,KHAN U,et al.Transfer Learning Assisted Classification and Detection of Alzheimer’s Disease Stages Using 3D MRI Scans[J].Sensors,2019,19(11):2645-2663.
[5]PAUL A,MUKHERJEE D P,DAS P,et al.Improved Random Forest for Classification[J].IEEE Transactions on Image Processing,2018,27(8):4012-4024.
[6]KONG Y,FU Y.Human Action Recognition and Prediction:A Survey[J].International Journal of Computer Vision,2022,130(5):1366-1401.
[7]ALI S,BASHARAT A,SHAH M.Chaotic invariants for human action recognition[C]//2007 IEEE 11th International Confe-rence on Computer Vision.NJ:IEEE,2007:1-8.
[8]JHUANG H,GALL J,ZUFFI S,et al.Towards understanding action recognition[C]//2013 14th IEEE International Confe-rence on Computer Vision.CA:IEEE Computer Society,2013:3192-3199.
[9]AKULA A,SHAH A K,GHOSH R.Deep learning approach for human action recognition in infrared images[J].Cognitive Systems Research,2018,50:146-154.
[10]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2323.
[11]GAO C,DU Y,LIU J,et al.InfAR dataset:Infrared action re-cognition at different times[J].Neurocomputing,2016,212:36-47.
[12]LIU Y,LU Z,LI J,et al.Global temporal representation based cnns for infrared action recognition[J].IEEE Signal Processing Letters,2018,25(6):848-852.
[13]QUAN Z,CHEN Q,ZHAO K,et al.Knowledge Distillation forAction Recognition Based on RGB and Infrared Videos[C]//18th International Forum Digital TV and Wireless Multimedia Communications(IFTC 2021).Singapore:Springer Singapore,2022:18-29.
[14]DE BOISSIERE A M,NOUMEIR R.Infrared and 3d skeleton feature fusion for rgb-d action recognition[J].IEEE Access,2020,8:168297-168308.
[15]XIAO Y,ZHOU J.Overview of Image Edge Detection[J].Computer Engineering and Applications,2023,59(5):40-54.
[16]LI C,QU Z.Review of image edge detection algorithms based on deep learning[J].Journal of Computer Applications,2020,40(11):3280-3288.
[17]XIU C,YIN H,LIU Y.Image Segmentation of CV Model Combined with Sobel Operator[C]//2020 Chinese Control And Decision Conference(CCDC).NJ:IEEE,2020:4356-4360.
[18]WANG A,LIU X.Vehicle license plate location based on im-proved Roberts operator and mathematical morphology[C]//2012 Second International Conference on Instrumentation,Measurement,Computer,Communication and Control.NJ:IEEE,2012:995-998.
[19]LU X,ZHANG Y.Human body flexibility fitness test based on image edge detection and feature point extraction[J].Soft Computing,2020,24(12):8673-8683.
[20]ZHANG C,GE L,CHEN Z,et al.Refined TV-l 1 optical flow estimation using joint filtering[J].IEEE Transactions on Multimedia,2019,22(2):349-364.
[21]WANG S H,FERNANDES S L,ZHU Z,et al.AVNC:attention-based VGG-style network for COVID-19 diagnosis by CBAM[J].IEEE Sensors Journal,2021,22(18):17431-17438.
[22]HARA K,KATAOKA H,SATOH Y.Can spatiotemporal 3dcnns retrace the history of 2d cnns and imagenet? [C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.CA:IEEE Computer Society,2018:6546-6555.
[23]TONG A,TANG C,WANG W.Semi-supervised Action Recognition from Temporal Augmentation Using Curriculum Learning[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,33(3):1305-1319.
[24]LIU W B,ZOU Z Y,XING W W.Feature fusion methods in pattern classification[J].Journal of Beijing University of Posts and Telecommunications,2017,40(4):1-8.
[25]WANG H,KLÄSER A,SCHMID C,et al.Dense Trajectories and Motion Boundary Descriptors for Action Recognition[J].International Journal of Computer Vision,2013,103(1):60-79.
[26]YANG J,YANG J Y,ZHANG D,et al.Feature fusion:parallel strategy vs.serial strategy[J].Pattern Recognition,2003,36(6):1369-1381.
[27]DONG X,YU Z,CAO W,et al.A survey on ensemble learning[J].Frontiers of Computer Science,2019,14(2):241-258.
[28]NALEPA J,KAWULOK M.Selecting training sets for support vector machines:a review[J].Artificial Intelligence Review,2018,52(2):857-900.
[29]OGUNLEYE A,WANG Q G.XGBoost model for chronic kidney disease diagnosis[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2019,17(6):2131-2140.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!