计算机科学 ›› 2021, Vol. 48 ›› Issue (7): 206-212.doi: 10.11896/jsjkx.200900093

• 计算机图形学&多媒体 • 上一篇    下一篇

在线异常事件检测的时序建模

卿来云1, 张建功1, 苗军2   

  1. 1 中国科学院大学计算机科学与技术学院 北京100049
    2 北京信息科技大学网络文化与数字传播北京市重点实验室 北京100101
  • 收稿日期:2020-09-13 修回日期:2020-10-25 出版日期:2021-07-15 发布日期:2021-07-02
  • 通讯作者: 卿来云(lyqing@ucas.ac.cn)
  • 基金资助:
    国家自然科学基金面上项目(61872333Y);北京未来芯片技术高精尖创新中心科研基金(KYJJ2018004);北京教委科技计划项目(KM201911232003);北京市自然科学基金(4202025)

Temporal Modeling for Online Anomaly Detection

QING Lai-yun1, ZHANG Jian-gong1, MIAO Jun2   

  1. 1 School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China
    2 Beijing Key Laboratory Internet Culture Digital Dissemination Research,Beijing Information Science & Technology University,Beijing 100101,China
  • Received:2020-09-13 Revised:2020-10-25 Online:2021-07-15 Published:2021-07-02
  • About author:QING Lai-yun,born in 1974,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include multimedia,computer vision and machine learning.
  • Supported by:
    NSFC (61872333Y),Research Fund from Beijing Innovation Center for Future Chips (KYJJ2018004),Beijing Municipal Education Commission Project (KM201911232003) and Beijing Natural Science Foundation (4202025).

摘要: 弱监督异常事件检测是一项极富挑战性的任务,其目标是在已知正常和异常视频标签的监督下,定位出异常发生的具体时序区间。文中采用多示例排序网络来实现弱监督异常事件检测任务,该框架在视频被切分为固定数量的片段后,将一个视频抽象为一个包,每个片段相当于包中的示例,多示例学习在已知包类别的前提下训练示例分类器。由于视频有丰富的时序信息,因此重点关注监控视频在线检测的时序关系。从全局和局部角度出发,采用自注意力模块学习出每个示例的权重,通过自注意力值与示例异常得分的线性加权,来获得视频整体的异常分数,并采用均方误差损失训练自注意力模块。另外,引入 LSTM 和时序卷积两种方式对时序建模,其中时序卷积又分为单一类别的时序空洞卷积和融合了不同空洞率的多尺度的金字塔时序空洞卷积。实验结果显示,多尺度的时序卷积优于单一类别的时序卷积,时序卷积联合包内包外互补损失的方法在当前 UCF-Crime 数据集上比不包含时序模块的基线方法的AUC指标高出了3.2%。

关键词: 多示例学习, 弱监督学习, 时序卷积网络, 异常事件检测, 注意力机制

Abstract: Weakly supervised anomaly detection (WSAD) is a challenging task in that there is only normal and anomaly video label supervision but it is required to localize intervals where anomalies take place.We employ multiple instance learning (MIL) network for weakly supervised anomaly detection,which regards the input video as a bag and the segments chunked from the vi-deo as instances in it.We train the instance classifier with only label of video level (bag level),while the label of instance level is unknown.As there is strong temporal information in videos,we focus on temporal relationship for online anomaly detection in surveillance videos.We consider both global and local perspective and use self-attention module to learn each instance weight.We get the linear weighted sum of self-attention score and instance anomaly score,which represents video level anomaly score.Then the mean square error loss is employed to train the self-attention module.Online constraints allow us to use historical and current video clips only,without future frames.In order to model the temporal structure of video,we introduce LSTM and temporal con-volutional network (TCN) into WSAD problem.We explore single rate dilated temporal convolutional network,and pyramid dilated temporal convolutional network (PDTCN) which fuses multi-scale feature with different rates.Experiments show that the AUC of PDTCN with complementary inner and outer bag loss is higher than that of the baseline method without temporal mode-ling by 3.2% on UCF-Crime dataset.

Key words: Anomaly detection, Attention module, Multiple instance learning, Temporal convolutional network, Weakly-supervised learning

中图分类号: 

  • TP391
[1]BAI S,KOLTER J Z,KOLTUN V.An empirical evaluation of generic convolutional and recurrent networks for sequence mo-deling[J].arXiv:1803.01271,2018.
[2]SULTANI W,CHEN C,SHAH M.Real-world anomaly detection in surveillance videos[J].arXiv:1801.04264,2018.
[3]BILEN H,VEDALDI A.Weakly supervised deep detection networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2846-2854.
[4]TANG P,WANG X,BAI X,et al.Multiple instance detectionnetwork with online instance classifier refinement[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2843-2851.
[5]LI D,HUANG J,LI Y,et al.Weakly supervised object localization with progressive domain adaptation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3512-3520.
[6]ZHANG Y,BAI Y,DING M,et al.W2f:A weakly-supervised to fully-supervised framework for object detection[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:928-936.
[7]NGUYEN P,HAN B,LIU T,et al.Weakly supervised action localization by sparse temporal pooling network[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2018.
[8]ZHOU B,KHOSLA A,LAPEDRIZA A,et al.Learning deepfeatures for discriminative localization[C]//2016 IEEE Confe-rence on Computer Vision and Pattern Recognition (CVPR).2016
[9]PAUL S,ROY S,ROY-CHOWDHURY A K.W-talc:Weakly-supervised temporal activity localization and classification[C]//Proceedings of the European Conference on Computer Vision.2018:563-579.
[10]LEE P,UH Y,BYUN H.Background suppression network for weakly-supervised temporal action localization[J].arXiv:1911.09963.2019.
[11]HASAN M,CHOI J,NEUMANN J,et al.Learning temporal regularity in video sequences[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:733-742.
[12]LU C,SHI J,JIA J.Abnormal event detection at 150 fps in matlab[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:2720-2727.
[13]ZHAO Y,DENG B,SHEN C,et al.Spatio-temporal autoencoder for video anomaly detection [C]//Proceedings of the 2017 ACM on Multimedia Conference.ACM,2017:1933-1941.
[14]LIU W,LUO W,LIAN D,et al.Future frame prediction foranomaly detection-a new baseline [J].arXiv:1712.09867,2017.
[15]DOSOVITSKIY A,FISCHER P,ILG E,et al.Flownet:Lear-ning optical flow with convolutional networks[C]//2015 IEEE International Conference on Computer Vision (ICCV).2015.
[16]IONESCU R T,KHAN F S,GEORGESCU M I,et al.Object-centric auto-encoders and dummy anomalies for abnormal event detection in video[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).2019.
[17]GONG D,LIU L,LE V,et al.Memorizing normality to detectanomaly:Memory-augmented deep autoencoder for unsupervised anomaly detection[C]//2019 IEEE/CVF International Confe-rence on Computer Vision (ICCV).2019.
[18]ZHANG J G,QING L Y,MIAO J.Temporal convolutional network with complementary inner bag loss for weakly supervised anomaly detection[C]//Proceedings of IEEE International Conference on Image Processing.2019:4030-4034.
[19]TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3d convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4489-4497.
[20]ZHU Y,NEWSAM S.Motion-aware feature for improved video anomaly detection[J].arXiv:1907.10211,2019.
[21]WANG W,PENG X,QIAO Y,et al.A comprehensive study on temporal modeling for online action detection[J].arXiv:2001.07501,2020.
[22]OORD A V D,DIELEMAN S,ZEN H,et al.Wavenet:A gene-rative model for raw audio[J].arXiv:1609.03499,2016.
[23]LI J,ZHANG S,WANG J,et al.Global-local temporal representations for video person reidentification[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV).2019.
[24]DUCHI J,HAZAN E,SINGER Y.Adaptive subgradient me-thods for online learning and stochastic optimization[J].Journal of Machine Learning Research,2011,12(7):2121-2159.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[3] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[4] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[5] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[6] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[8] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[9] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[10] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[11] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[12] 熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚.
融合双向门控循环单元和注意力机制的软件自承认技术债识别方法
Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism
计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075
[13] 彭双, 伍江江, 陈浩, 杜春, 李军.
基于注意力神经网络的对地观测卫星星上自主任务规划方法
Satellite Onboard Observation Task Planning Based on Attention Neural Network
计算机科学, 2022, 49(7): 242-247. https://doi.org/10.11896/jsjkx.210500093
[14] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[15] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!