Computer Science ›› 2022, Vol. 49 ›› Issue (6): 254-261.doi: 10.11896/jsjkx.210400272

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model

SHAO Yan-hua1, LI Wen-feng1, ZHANG Xiao-qiang1, CHU Hong-yu1, RAO Yun-bo2, CHEN Lu1   

  1. 1 School of Information,Southwest University of Science and Technology,Mianyang,Sichuan 621000,China
    2 School of Information and Software Engineering,University of Electronic Science & Technology,Chengdu 610054,China
  • Received:2021-04-26 Revised:2021-08-11 Online:2022-06-15 Published:2022-06-08
  • About author:SHAO Yan-hua,born in 1982,Ph.D,lecturer,is a member of China Computer Federation.His main research interests include computervision and machine learning.
    LI Wen-feng,born in 1997,postgra-duate.His main research interests include computervision and so on.
  • Supported by:
    National Natural Science Foundation of China(61601382),Project of Sichuan Provincial Department of Education(17ZB0454), Doctoral Fund of Southwest University of Science and Technology(19zx7123) and Longshan Talent of Southwest University of Science and technology(18LZX632).

Abstract: The violence in public areas occurs frequently and video surveillance is of great significance for maintaining public safety.Compared with fixed cameras,unmanned aerial vehicles (UAVs) have surveillance mobility.However,in aerial images,the rapid movement of UAVs as well as the change of posture and height cause the problem of motion blur and large-scale change of target.To solve this problem,an attention spatial-temporal convolutional network (AST-GCN) combining attention mechanism is designed to realize the identification of violent behavior in aerial video.The proposed method is divided into two steps:the key frame detection network completes the initial positioning,and the AST-GCN network completes the behavior identification through the sequence features.Firstly,aiming at video violence localization,a key frame cascade detection network is designed to realize violence key frame detection based on human posture estimation,and preliminarily judge the occurrence time of violence.Secondly,the skeleton information of multiple frames around key frames is extracted from the video sequence,and the skeleton data is pre-processed,including normalization,screening and completion,so as to improve the robustness of different scenes and the partial missing of key nodes.And the skeleton temporal-spatial representation matrix is constructed according to the extracted skeleton information.Finally,AST-GCN network analyzes and identifies multiple frames of human skeleton information,to integrate attention module,improve feature expression ability,and complete the recognition of violent behavior.The method is validated on self-built aerial violence data set,and experimental results show that the AST-GCN can realize the recognition of aerial scene violence,and the recognition accuracy is 86.6%.The proposed method has important engineering value and scientific signifi-cance for the realization of aerial video surveillance and human pose understanding applications.

Key words: Aerial photography, Attention mechanism, Cascade network, Human pose estimation, Spatial-temporal graph convolutional, Violence recognition

CLC Number: 

  • TP391
[1] MA Y X,TAN L,DONG X,et al.Behavior Recognition For Smart Surveillance[J].Journal of Image and Graphics,2019,24(2):282-290.
[2] DOROGYY Y,KOLISNICHENKO V,LEVCHENKO K.Violent crime detection system[C]//2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT).IEEE,2018:352-355.
[3] LI T,LIU J,ZHANG W,et al.UAV-Human:A Large Benchmark for Human BehaviorUnderstanding with Unmanned Aerial Vehicles[C]//2021 Conference on Computer Vision and Pattern Recognition (CVPR).2021:16266-16275.
[4] CHEN L.Violent behavior monitoring in aerial scenes based on human pose estimation[D].Mianyang:Southwest University of Science and Technology,2020.
[5] SONG W,ZHANG D,ZHAO X,et al.A Novel Violent Video Detection Scheme Based on Modified 3D Convolutional Neural Networks[J].IEEE Access,2019,7:39172-39179.
[6] HE L,SHAO Z P,ZHANG J H,et al.Review of Deep Learning-based Action Recognition Algorithms[J].Computer Science,2020,47(6A):139-147.
[7] TIAN Z S,YANG L K,FU C Y,et al.Human action recognition based on multi-antenna FMCW radar[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2020,32(5):779-787.
[8] YAN S,XIONG Y,LIN D.Spatial Temporal Graph Convolu-tional Networks for Skeleton-Based Action Recognition[C]//Thirty-second AAAI Conference on Artificial Intelligence(AAAI).2018:7444-7452.
[9] YAO G,LEI T,ZHONG J.A Review of Convolutional-Neural-Network-Based Action Recognition[J].Pattern Recognition Letters,2018,118:14-22.
[10] GAO C Q,CHEN X.Deep learning based action detection:a survey[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2020,32(6):991-102.
[11] FANG H S,XIE S,TAI Y W,et al.RMPE:Regional Multi-person Pose Estimation[C]//2017 International Conference on Computer Vision (ICCV).2017:2353-2362.
[12] CAO Z,SIMON T,WEI S,et al.Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields[C]//Computer Vision and Pattern Recognition.2017:1302-1310.
[13] LIU J,SHAHROUDY A,PEREZ M,et al.Ntu rgb+d 120:A large-scale benchmark for 3d human activity understanding[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,42(10):2684-2701.
[14] LI M H,XU H J,SHI L X,et al.Multi-person Activity Recognition Based on Bone Keypoints Detection[J].Computer Science,2021,48(4):138-143.
[15] KIM T S,RRITER A.Interpretable 3D Human Action Analysis with Temporal Convolutional Networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).2017:20-28.
[16] CHENG K,ZHANG Y,HE X,et al.Skeleton-based action recognition with shift graph convolutional network[C]//2020 Conference on Computer Vision and Pattern Recognition (CVPR).2020:183-192.
[17] HU J,SHEN L,ALBANIE S,et al.Squeeze-and-Excitation Net-works[J].IEEE Transactions on Pattern Analysis Machine Intelligence,2017,42(8):2011-2023.
[18] SHI L,ZHANG Y,CHENG J,et al.Skeleton-based action reco-gnition with multi-stream adaptive graph convolutional networks[J].IEEE Transactions on Image Processing,2020,29:9532-9545.
[19] MARKOVITZ A,SHARIR G,FRIEDMAN I,et al.Graph Embedded Pose Clustering for Anomaly Detection[C]//2020 Conference on Computer Vision and Pattern Recognition(CVPR).2020:10536-10544.
[20] LIN M,CHEN Q,YAN S.Network In Network[C]//2014 International Conference on Learning Representations(ICLR)[J].arXiv:1312.4400,2013.
[21] SINGH A,PATIL D,OMKAR S.Eye in the Sky:Real-timeDrone Surveillance System (DSS) for Violent Individuals Identification using ScatterNet Hybrid Deep Learning Network[C]//2018 Conference on Computer Vision and Pattern Recognition (CVPR).2018:1629-1637.
[22] FERNANDO B,GAVVES E,ORAMAS J,et al.Modeling video evolution for action recognition[C]//2015 Conference on Computer Vision and Pattern Recognition(CVPR).2015:5378-5387.
[23] SHAHROUDY A,LIU J,NG T T,et al.Ntu rgb+d:A large scale dataset for 3d human activityanalysis[C]//2016 Confe-rence on Computer Vision and Pattern Recognition(CVPR).2016:1010-1019.
[1] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[3] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[4] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[5] XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182.
[6] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[7] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[8] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[9] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[10] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[11] ZHANG Ying-tao, ZHANG Jie, ZHANG Rui, ZHANG Wen-qiang. Photorealistic Style Transfer Guided by Global Information [J]. Computer Science, 2022, 49(7): 100-105.
[12] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[13] XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[14] MENG Yue-bo, MU Si-rong, LIU Guang-hui, XU Sheng-jun, HAN Jiu-qiang. Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism [J]. Computer Science, 2022, 49(7): 142-147.
[15] JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!