Computer Science ›› 2024, Vol. 51 ›› Issue (8): 232-241.doi: 10.11896/jsjkx.230600143

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Infrared Human Action Recognition Method Based on Multimodal Attention Network

WANG Chao1, TANG Chao1, WANG Wenjian2, ZHANG Jing3   

  1. 1 School of Artificial Intelligence and Big Data,Hefei University,Hefei 230601,China
    2 School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    3 Science Island Branch of Graduate School,University of Science and Technology of China,Hefei 230031,China
  • Received:2023-06-16 Revised:2023-08-24 Online:2024-08-15 Published:2024-08-13
  • About author:WANG Chao,born in 1998,postgra-duate.His main research interests include machine learning and computer vision.
    TANG Chao,born in 1977,Ph.D,asso-ciate professor,master’s supervisor,is a member of CCF(No.40989M).His main research interests include artificial intelligence,pattern recognition,machine learning,and computer vision.
  • Supported by:
    National Natural Science Foundation of China(62076154,U21A20513),Natural Science Foundation of Anhui Province,China(2008085MF202),Research Projects of Hefei University(22050123010),Anhui Provincial Graduate Academic Innovation Project(2022xscx145) and Anhui Provincial College Student Innovation and Entrepreneurship Training Program Project(1602582519599861760).

Abstract: Human behavior recognition has become one of the research hotspots in the field of machine vision and pattern recognition,and has important research value.Many intelligent services require rapid and accurate recognition of human behavior.Human behavior recognition has important research significance and wide application value in fields such as intelligent monitoring and smart home,and has been widely studied by scholars at home and abroad.Human behavior recognition usually uses visible light video data,but visible light videos are easily affected by light and cannot adapt to nighttime recognition.Due to the characteristics of infrared information such as being less affected by light and protecting privacy,human behavior recognition methods based on infrared video have great significance.Deep learning network has some limitations on the learning and representation ability of infrared single mode data.To solve the above problems,an infrared human behavior recognition method based on multimodal attention network is proposed.Because the deep learning network model cannot directly train and classify the video information,first,the preprocessing module preprocesses the video information obtained into infrared views,and then extracts the edge information and optical flow information of the infrared view through Sobel operator and L1 norm based total variation optical flow method to obtain the edge view and optical flow view respectively.Secondly,input the infrared view,edge view,and optical flow view into the three stream network fused with the attention mechanism module for feature learning.Then,fuse the multimodal features extracted from each network in the three stream network.Finally,the fusion feature vector is input to random forest for training and classification.Experimental results on the public dataset NTU RGB+D and the self-built dataset indicate that the proposed me-thod has good recognition performance.In the future,we will consider expanding our method to more datasets to verify its effectiveness.

Key words: Multimodal, Attention mechanism, Three stream network, Feature fusion, Random forest

CLC Number: 

  • TP391
[1]HERATH S,HARANDI M,PORIKLI F.Going Deeper intoAction Recognition:A Survey[J].Image and Vision Computing,2017,60:4-21.
[2]PAN L L,CHEN Q K.Abnormal Behavior Detection ModelBased on Multi-sensor Sequence for Eldercare[J].Journal of Chinese Computer Systems,2022,43(9):1984-1991.
[3]GUO W,WANG J,WANG S.Deep Multimodal Representation Learning:A Survey[J].IEEE Access,2019,7:63373-63394.
[4]MAQSOOD M,NAZIR F,KHAN U,et al.Transfer Learning Assisted Classification and Detection of Alzheimer’s Disease Stages Using 3D MRI Scans[J].Sensors,2019,19(11):2645-2663.
[5]PAUL A,MUKHERJEE D P,DAS P,et al.Improved Random Forest for Classification[J].IEEE Transactions on Image Processing,2018,27(8):4012-4024.
[6]KONG Y,FU Y.Human Action Recognition and Prediction:A Survey[J].International Journal of Computer Vision,2022,130(5):1366-1401.
[7]ALI S,BASHARAT A,SHAH M.Chaotic invariants for human action recognition[C]//2007 IEEE 11th International Confe-rence on Computer Vision.NJ:IEEE,2007:1-8.
[8]JHUANG H,GALL J,ZUFFI S,et al.Towards understanding action recognition[C]//2013 14th IEEE International Confe-rence on Computer Vision.CA:IEEE Computer Society,2013:3192-3199.
[9]AKULA A,SHAH A K,GHOSH R.Deep learning approach for human action recognition in infrared images[J].Cognitive Systems Research,2018,50:146-154.
[10]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2323.
[11]GAO C,DU Y,LIU J,et al.InfAR dataset:Infrared action re-cognition at different times[J].Neurocomputing,2016,212:36-47.
[12]LIU Y,LU Z,LI J,et al.Global temporal representation based cnns for infrared action recognition[J].IEEE Signal Processing Letters,2018,25(6):848-852.
[13]QUAN Z,CHEN Q,ZHAO K,et al.Knowledge Distillation forAction Recognition Based on RGB and Infrared Videos[C]//18th International Forum Digital TV and Wireless Multimedia Communications(IFTC 2021).Singapore:Springer Singapore,2022:18-29.
[14]DE BOISSIERE A M,NOUMEIR R.Infrared and 3d skeleton feature fusion for rgb-d action recognition[J].IEEE Access,2020,8:168297-168308.
[15]XIAO Y,ZHOU J.Overview of Image Edge Detection[J].Computer Engineering and Applications,2023,59(5):40-54.
[16]LI C,QU Z.Review of image edge detection algorithms based on deep learning[J].Journal of Computer Applications,2020,40(11):3280-3288.
[17]XIU C,YIN H,LIU Y.Image Segmentation of CV Model Combined with Sobel Operator[C]//2020 Chinese Control And Decision Conference(CCDC).NJ:IEEE,2020:4356-4360.
[18]WANG A,LIU X.Vehicle license plate location based on im-proved Roberts operator and mathematical morphology[C]//2012 Second International Conference on Instrumentation,Measurement,Computer,Communication and Control.NJ:IEEE,2012:995-998.
[19]LU X,ZHANG Y.Human body flexibility fitness test based on image edge detection and feature point extraction[J].Soft Computing,2020,24(12):8673-8683.
[20]ZHANG C,GE L,CHEN Z,et al.Refined TV-l 1 optical flow estimation using joint filtering[J].IEEE Transactions on Multimedia,2019,22(2):349-364.
[21]WANG S H,FERNANDES S L,ZHU Z,et al.AVNC:attention-based VGG-style network for COVID-19 diagnosis by CBAM[J].IEEE Sensors Journal,2021,22(18):17431-17438.
[22]HARA K,KATAOKA H,SATOH Y.Can spatiotemporal 3dcnns retrace the history of 2d cnns and imagenet? [C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.CA:IEEE Computer Society,2018:6546-6555.
[23]TONG A,TANG C,WANG W.Semi-supervised Action Recognition from Temporal Augmentation Using Curriculum Learning[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,33(3):1305-1319.
[24]LIU W B,ZOU Z Y,XING W W.Feature fusion methods in pattern classification[J].Journal of Beijing University of Posts and Telecommunications,2017,40(4):1-8.
[25]WANG H,KLÄSER A,SCHMID C,et al.Dense Trajectories and Motion Boundary Descriptors for Action Recognition[J].International Journal of Computer Vision,2013,103(1):60-79.
[26]YANG J,YANG J Y,ZHANG D,et al.Feature fusion:parallel strategy vs.serial strategy[J].Pattern Recognition,2003,36(6):1369-1381.
[27]DONG X,YU Z,CAO W,et al.A survey on ensemble learning[J].Frontiers of Computer Science,2019,14(2):241-258.
[28]NALEPA J,KAWULOK M.Selecting training sets for support vector machines:a review[J].Artificial Intelligence Review,2018,52(2):857-900.
[29]OGUNLEYE A,WANG Q G.XGBoost model for chronic kidney disease diagnosis[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2019,17(6):2131-2140.
[1] LIU Sichun, WANG Xiaoping, PEI Xilong, LUO Hangyu. Scene Segmentation Model Based on Dual Learning [J]. Computer Science, 2024, 51(8): 133-142.
[2] ZHANG Rui, WANG Ziqi, LI Yang, WANG Jiabao, CHEN Yao. Task-aware Few-shot SAR Image Classification Method Based on Multi-scale Attention Mechanism [J]. Computer Science, 2024, 51(8): 160-167.
[3] WANG Qian, HE Lang, WANG Zhanqing, HUANG Kun. Road Extraction Algorithm for Remote Sensing Images Based on Improved DeepLabv3+ [J]. Computer Science, 2024, 51(8): 168-175.
[4] XIAO Xiao, BAI Zhengyao, LI Zekai, LIU Xuheng, DU Jiajin. Parallel Multi-scale with Attention Mechanism for Point Cloud Upsampling [J]. Computer Science, 2024, 51(8): 183-191.
[5] PU Bin, LIANG Zhengyou, SUN Yu. Monocular 3D Object Detection Based on Height-Depth Constraint and Edge Fusion [J]. Computer Science, 2024, 51(8): 192-199.
[6] ZHANG Junsan, CHENG Ming, SHEN Xiuxuan, LIU Yuxue, WANG Leiquan. Diversified Label Matrix Based Medical Image Report Generation [J]. Computer Science, 2024, 51(8): 200-208.
[7] ZHANG Lu, DUAN Youxiang, LIU Juan, LU Yuxi. Chinese Geological Entity Relation Extraction Based on RoBERTa and Weighted Graph Convolutional Networks [J]. Computer Science, 2024, 51(8): 297-303.
[8] CHEN Shanshan, YAO Subin. Study on Recommendation Algorithms Based on Knowledge Graph and Neighbor PerceptionAttention Mechanism [J]. Computer Science, 2024, 51(8): 313-323.
[9] FAN Yi, HU Tao, YI Peng. Host Anomaly Detection Framework Based on Multifaceted Information Fusion of SemanticFeatures for System Calls [J]. Computer Science, 2024, 51(7): 380-388.
[10] BAI Wenchao, BAI Shuwen, HAN Xixian, ZHAO Yubo. Efficient Query Workload Prediction Algorithm Based on TCN-A [J]. Computer Science, 2024, 51(7): 71-79.
[11] ZENG Zihui, LI Chaoyang, LIAO Qing. Multivariate Time Series Anomaly Detection Algorithm in Missing Value Scenario [J]. Computer Science, 2024, 51(7): 108-115.
[12] YAN Qiuyan, SUN Hao, SI Yuqing, YUAN Guan. Multimodality and Forgetting Mechanisms Model for Knowledge Tracing [J]. Computer Science, 2024, 51(7): 133-139.
[13] YANG Zhenzhen, WANG Dongtao, YANG Yongpeng, HUA Renyu. Multi-embedding Fusion Based on top-N Recommendation [J]. Computer Science, 2024, 51(7): 140-145.
[14] HU Haibo, YANG Dan, NIE Tiezheng, KOU Yue. Graph Contrastive Learning Incorporating Multi-influence and Preference for Social Recommendation [J]. Computer Science, 2024, 51(7): 146-155.
[15] LI Jiaying, LIANG Yudong, LI Shaoji, ZHANG Kunpeng, ZHANG Chao. Study on Algorithm of Depth Image Super-resolution Guided by High-frequency Information ofColor Images [J]. Computer Science, 2024, 51(7): 197-205.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!