计算机科学 ›› 2022, Vol. 49 ›› Issue (6): 254-261.doi: 10.11896/jsjkx.210400272

• 计算机图形学&多媒体 • 上一篇    下一篇

基于时空图卷积和注意力模型的航拍暴力行为识别

邵延华1, 李文峰1, 张晓强1, 楚红雨1, 饶云波2, 陈璐1   

  1. 1 西南科技大学信息工程学院 四川 绵阳 621000
    2 电子科技大学信息与软件工程学院 成都 610054
  • 收稿日期:2021-04-26 修回日期:2021-08-11 出版日期:2022-06-15 发布日期:2022-06-08
  • 通讯作者: 李文峰(lwf.swust@qq.com)
  • 作者简介:(syh@cqu.edu.cn)
  • 基金资助:
    国家自然科学基金(61601382);四川省教育厅项目(17ZB0454);西南科技大学博士基金(19zx7123);西南科技大学龙山人才(18LZX632)

Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model

SHAO Yan-hua1, LI Wen-feng1, ZHANG Xiao-qiang1, CHU Hong-yu1, RAO Yun-bo2, CHEN Lu1   

  1. 1 School of Information,Southwest University of Science and Technology,Mianyang,Sichuan 621000,China
    2 School of Information and Software Engineering,University of Electronic Science & Technology,Chengdu 610054,China
  • Received:2021-04-26 Revised:2021-08-11 Online:2022-06-15 Published:2022-06-08
  • About author:SHAO Yan-hua,born in 1982,Ph.D,lecturer,is a member of China Computer Federation.His main research interests include computervision and machine learning.
    LI Wen-feng,born in 1997,postgra-duate.His main research interests include computervision and so on.
  • Supported by:
    National Natural Science Foundation of China(61601382),Project of Sichuan Provincial Department of Education(17ZB0454), Doctoral Fund of Southwest University of Science and Technology(19zx7123) and Longshan Talent of Southwest University of Science and technology(18LZX632).

摘要: 公共区域暴力行为频繁发生,视频监控对维护公共安全具有重要意义。相比固定摄像头,无人机具有监控灵活性,然而航拍成像中无人机快速运动以及姿态、高度的变化,使得目标出现运动模糊、尺度变化大的问题,针对该问题,设计了一种融合注意力机制的时空图卷积网络AST-GCN(Attention Spatial-Temporal Graph Convolutional Networks),用于实现航拍视频暴力行为识别。该方法主要分为两步:利用关键帧检测网络完成初定位以及AST-GCN网络通过序列特征完成行为识别确认。首先,针对视频暴力行为定位,设计关键帧级联检测网络,实现基于人体姿态估计的暴力行为关键帧检测,初步判断暴力行为的发生时间。其次,在视频序列中提取关键帧前后的多帧人体骨架信息,对骨架数据进行归一化、筛选和补全,以提高对不同场景及部分关节点缺失的鲁棒性,并根据提取的骨架信息构建骨架时序-空间信息表达矩阵。最后,时空图卷积对多帧人体骨架信息进行分析识别,融合注意力模块,提升特征表达能力,完成暴力行为识别。在自建航拍暴力行为数据集上进行验证,实验结果表明,融合注意力机制的时空图卷积AST-GCN能实现航拍场景暴力行为识别,识别准确率达86.6%。提出的航拍暴力行为识别方法对于航拍视频监控和行为理解等应用具有重要的工程价值和科学意义。

关键词: 暴力行为识别, 航拍, 级联网络, 人体姿态估计, 时空图卷积, 注意力机制

Abstract: The violence in public areas occurs frequently and video surveillance is of great significance for maintaining public safety.Compared with fixed cameras,unmanned aerial vehicles (UAVs) have surveillance mobility.However,in aerial images,the rapid movement of UAVs as well as the change of posture and height cause the problem of motion blur and large-scale change of target.To solve this problem,an attention spatial-temporal convolutional network (AST-GCN) combining attention mechanism is designed to realize the identification of violent behavior in aerial video.The proposed method is divided into two steps:the key frame detection network completes the initial positioning,and the AST-GCN network completes the behavior identification through the sequence features.Firstly,aiming at video violence localization,a key frame cascade detection network is designed to realize violence key frame detection based on human posture estimation,and preliminarily judge the occurrence time of violence.Secondly,the skeleton information of multiple frames around key frames is extracted from the video sequence,and the skeleton data is pre-processed,including normalization,screening and completion,so as to improve the robustness of different scenes and the partial missing of key nodes.And the skeleton temporal-spatial representation matrix is constructed according to the extracted skeleton information.Finally,AST-GCN network analyzes and identifies multiple frames of human skeleton information,to integrate attention module,improve feature expression ability,and complete the recognition of violent behavior.The method is validated on self-built aerial violence data set,and experimental results show that the AST-GCN can realize the recognition of aerial scene violence,and the recognition accuracy is 86.6%.The proposed method has important engineering value and scientific signifi-cance for the realization of aerial video surveillance and human pose understanding applications.

Key words: Aerial photography, Attention mechanism, Cascade network, Human pose estimation, Spatial-temporal graph convolutional, Violence recognition

中图分类号: 

  • TP391
[1] MA Y X,TAN L,DONG X,et al.Behavior Recognition For Smart Surveillance[J].Journal of Image and Graphics,2019,24(2):282-290.
[2] DOROGYY Y,KOLISNICHENKO V,LEVCHENKO K.Violent crime detection system[C]//2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT).IEEE,2018:352-355.
[3] LI T,LIU J,ZHANG W,et al.UAV-Human:A Large Benchmark for Human BehaviorUnderstanding with Unmanned Aerial Vehicles[C]//2021 Conference on Computer Vision and Pattern Recognition (CVPR).2021:16266-16275.
[4] CHEN L.Violent behavior monitoring in aerial scenes based on human pose estimation[D].Mianyang:Southwest University of Science and Technology,2020.
[5] SONG W,ZHANG D,ZHAO X,et al.A Novel Violent Video Detection Scheme Based on Modified 3D Convolutional Neural Networks[J].IEEE Access,2019,7:39172-39179.
[6] HE L,SHAO Z P,ZHANG J H,et al.Review of Deep Learning-based Action Recognition Algorithms[J].Computer Science,2020,47(6A):139-147.
[7] TIAN Z S,YANG L K,FU C Y,et al.Human action recognition based on multi-antenna FMCW radar[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2020,32(5):779-787.
[8] YAN S,XIONG Y,LIN D.Spatial Temporal Graph Convolu-tional Networks for Skeleton-Based Action Recognition[C]//Thirty-second AAAI Conference on Artificial Intelligence(AAAI).2018:7444-7452.
[9] YAO G,LEI T,ZHONG J.A Review of Convolutional-Neural-Network-Based Action Recognition[J].Pattern Recognition Letters,2018,118:14-22.
[10] GAO C Q,CHEN X.Deep learning based action detection:a survey[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2020,32(6):991-102.
[11] FANG H S,XIE S,TAI Y W,et al.RMPE:Regional Multi-person Pose Estimation[C]//2017 International Conference on Computer Vision (ICCV).2017:2353-2362.
[12] CAO Z,SIMON T,WEI S,et al.Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields[C]//Computer Vision and Pattern Recognition.2017:1302-1310.
[13] LIU J,SHAHROUDY A,PEREZ M,et al.Ntu rgb+d 120:A large-scale benchmark for 3d human activity understanding[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,42(10):2684-2701.
[14] LI M H,XU H J,SHI L X,et al.Multi-person Activity Recognition Based on Bone Keypoints Detection[J].Computer Science,2021,48(4):138-143.
[15] KIM T S,RRITER A.Interpretable 3D Human Action Analysis with Temporal Convolutional Networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).2017:20-28.
[16] CHENG K,ZHANG Y,HE X,et al.Skeleton-based action recognition with shift graph convolutional network[C]//2020 Conference on Computer Vision and Pattern Recognition (CVPR).2020:183-192.
[17] HU J,SHEN L,ALBANIE S,et al.Squeeze-and-Excitation Net-works[J].IEEE Transactions on Pattern Analysis Machine Intelligence,2017,42(8):2011-2023.
[18] SHI L,ZHANG Y,CHENG J,et al.Skeleton-based action reco-gnition with multi-stream adaptive graph convolutional networks[J].IEEE Transactions on Image Processing,2020,29:9532-9545.
[19] MARKOVITZ A,SHARIR G,FRIEDMAN I,et al.Graph Embedded Pose Clustering for Anomaly Detection[C]//2020 Conference on Computer Vision and Pattern Recognition(CVPR).2020:10536-10544.
[20] LIN M,CHEN Q,YAN S.Network In Network[C]//2014 International Conference on Learning Representations(ICLR)[J].arXiv:1312.4400,2013.
[21] SINGH A,PATIL D,OMKAR S.Eye in the Sky:Real-timeDrone Surveillance System (DSS) for Violent Individuals Identification using ScatterNet Hybrid Deep Learning Network[C]//2018 Conference on Computer Vision and Pattern Recognition (CVPR).2018:1629-1637.
[22] FERNANDO B,GAVVES E,ORAMAS J,et al.Modeling video evolution for action recognition[C]//2015 Conference on Computer Vision and Pattern Recognition(CVPR).2015:5378-5387.
[23] SHAHROUDY A,LIU J,NG T T,et al.Ntu rgb+d:A large scale dataset for 3d human activityanalysis[C]//2016 Confe-rence on Computer Vision and Pattern Recognition(CVPR).2016:1010-1019.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[3] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[4] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[5] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[6] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[8] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[9] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[10] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[11] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[12] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[13] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[14] 孟月波, 穆思蓉, 刘光辉, 徐胜军, 韩九强.
基于向量注意力机制GoogLeNet-GMP的行人重识别方法
Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism
计算机科学, 2022, 49(7): 142-147. https://doi.org/10.11896/jsjkx.210600198
[15] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!