计算机科学 ›› 2022, Vol. 49 ›› Issue (6): 254-261.doi: 10.11896/jsjkx.210400272
邵延华1, 李文峰1, 张晓强1, 楚红雨1, 饶云波2, 陈璐1
SHAO Yan-hua1, LI Wen-feng1, ZHANG Xiao-qiang1, CHU Hong-yu1, RAO Yun-bo2, CHEN Lu1
摘要: 公共区域暴力行为频繁发生,视频监控对维护公共安全具有重要意义。相比固定摄像头,无人机具有监控灵活性,然而航拍成像中无人机快速运动以及姿态、高度的变化,使得目标出现运动模糊、尺度变化大的问题,针对该问题,设计了一种融合注意力机制的时空图卷积网络AST-GCN(Attention Spatial-Temporal Graph Convolutional Networks),用于实现航拍视频暴力行为识别。该方法主要分为两步:利用关键帧检测网络完成初定位以及AST-GCN网络通过序列特征完成行为识别确认。首先,针对视频暴力行为定位,设计关键帧级联检测网络,实现基于人体姿态估计的暴力行为关键帧检测,初步判断暴力行为的发生时间。其次,在视频序列中提取关键帧前后的多帧人体骨架信息,对骨架数据进行归一化、筛选和补全,以提高对不同场景及部分关节点缺失的鲁棒性,并根据提取的骨架信息构建骨架时序-空间信息表达矩阵。最后,时空图卷积对多帧人体骨架信息进行分析识别,融合注意力模块,提升特征表达能力,完成暴力行为识别。在自建航拍暴力行为数据集上进行验证,实验结果表明,融合注意力机制的时空图卷积AST-GCN能实现航拍场景暴力行为识别,识别准确率达86.6%。提出的航拍暴力行为识别方法对于航拍视频监控和行为理解等应用具有重要的工程价值和科学意义。
中图分类号:
[1] MA Y X,TAN L,DONG X,et al.Behavior Recognition For Smart Surveillance[J].Journal of Image and Graphics,2019,24(2):282-290. [2] DOROGYY Y,KOLISNICHENKO V,LEVCHENKO K.Violent crime detection system[C]//2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT).IEEE,2018:352-355. [3] LI T,LIU J,ZHANG W,et al.UAV-Human:A Large Benchmark for Human BehaviorUnderstanding with Unmanned Aerial Vehicles[C]//2021 Conference on Computer Vision and Pattern Recognition (CVPR).2021:16266-16275. [4] CHEN L.Violent behavior monitoring in aerial scenes based on human pose estimation[D].Mianyang:Southwest University of Science and Technology,2020. [5] SONG W,ZHANG D,ZHAO X,et al.A Novel Violent Video Detection Scheme Based on Modified 3D Convolutional Neural Networks[J].IEEE Access,2019,7:39172-39179. [6] HE L,SHAO Z P,ZHANG J H,et al.Review of Deep Learning-based Action Recognition Algorithms[J].Computer Science,2020,47(6A):139-147. [7] TIAN Z S,YANG L K,FU C Y,et al.Human action recognition based on multi-antenna FMCW radar[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2020,32(5):779-787. [8] YAN S,XIONG Y,LIN D.Spatial Temporal Graph Convolu-tional Networks for Skeleton-Based Action Recognition[C]//Thirty-second AAAI Conference on Artificial Intelligence(AAAI).2018:7444-7452. [9] YAO G,LEI T,ZHONG J.A Review of Convolutional-Neural-Network-Based Action Recognition[J].Pattern Recognition Letters,2018,118:14-22. [10] GAO C Q,CHEN X.Deep learning based action detection:a survey[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2020,32(6):991-102. [11] FANG H S,XIE S,TAI Y W,et al.RMPE:Regional Multi-person Pose Estimation[C]//2017 International Conference on Computer Vision (ICCV).2017:2353-2362. [12] CAO Z,SIMON T,WEI S,et al.Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields[C]//Computer Vision and Pattern Recognition.2017:1302-1310. [13] LIU J,SHAHROUDY A,PEREZ M,et al.Ntu rgb+d 120:A large-scale benchmark for 3d human activity understanding[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,42(10):2684-2701. [14] LI M H,XU H J,SHI L X,et al.Multi-person Activity Recognition Based on Bone Keypoints Detection[J].Computer Science,2021,48(4):138-143. [15] KIM T S,RRITER A.Interpretable 3D Human Action Analysis with Temporal Convolutional Networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).2017:20-28. [16] CHENG K,ZHANG Y,HE X,et al.Skeleton-based action recognition with shift graph convolutional network[C]//2020 Conference on Computer Vision and Pattern Recognition (CVPR).2020:183-192. [17] HU J,SHEN L,ALBANIE S,et al.Squeeze-and-Excitation Net-works[J].IEEE Transactions on Pattern Analysis Machine Intelligence,2017,42(8):2011-2023. [18] SHI L,ZHANG Y,CHENG J,et al.Skeleton-based action reco-gnition with multi-stream adaptive graph convolutional networks[J].IEEE Transactions on Image Processing,2020,29:9532-9545. [19] MARKOVITZ A,SHARIR G,FRIEDMAN I,et al.Graph Embedded Pose Clustering for Anomaly Detection[C]//2020 Conference on Computer Vision and Pattern Recognition(CVPR).2020:10536-10544. [20] LIN M,CHEN Q,YAN S.Network In Network[C]//2014 International Conference on Learning Representations(ICLR)[J].arXiv:1312.4400,2013. [21] SINGH A,PATIL D,OMKAR S.Eye in the Sky:Real-timeDrone Surveillance System (DSS) for Violent Individuals Identification using ScatterNet Hybrid Deep Learning Network[C]//2018 Conference on Computer Vision and Pattern Recognition (CVPR).2018:1629-1637. [22] FERNANDO B,GAVVES E,ORAMAS J,et al.Modeling video evolution for action recognition[C]//2015 Conference on Computer Vision and Pattern Recognition(CVPR).2015:5378-5387. [23] SHAHROUDY A,LIU J,NG T T,et al.Ntu rgb+d:A large scale dataset for 3d human activityanalysis[C]//2016 Confe-rence on Computer Vision and Pattern Recognition(CVPR).2016:1010-1019. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[3] | 戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032 |
[4] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[5] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[6] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[7] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 |
[8] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[9] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[10] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[11] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[12] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[13] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
[14] | 孟月波, 穆思蓉, 刘光辉, 徐胜军, 韩九强. 基于向量注意力机制GoogLeNet-GMP的行人重识别方法 Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism 计算机科学, 2022, 49(7): 142-147. https://doi.org/10.11896/jsjkx.210600198 |
[15] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
|