计算机科学 ›› 2024, Vol. 51 ›› Issue (1): 243-251.doi: 10.11896/jsjkx.230300134

• 计算机图形学&多媒体 • 上一篇    下一篇

基于双重动态记忆网络的弱监督视频异常检测

周文浩, 胡宏涛, 陈旭, 赵春晖   

  1. 浙江大学控制科学与工程学院 杭州310027
  • 收稿日期:2023-03-16 修回日期:2023-09-22 出版日期:2024-01-15 发布日期:2024-01-12
  • 通讯作者: 赵春晖(chhzhao@zju.edu.cn)
  • 作者简介:(zhouwenhao@zju.edu.cn)
  • 基金资助:
    国家自然科学基金杰出青年基金(62125306);NSFC——浙江两化融合联合基金(U1709211)

Weakly Supervised Video Anomaly Detection Based on Dual Dynamic Memory Network

ZHOU Wenhao, HU Hongtao, CHEN Xu, ZHAO Chunhui   

  1. School of Control Science and Engineering,Zhejiang University,Hangzhou 310027,China
  • Received:2023-03-16 Revised:2023-09-22 Online:2024-01-15 Published:2024-01-12
  • About author:ZHOU Wenhao,born in 1998,master.His main research interest is video and image anomaly detection.
    ZHAO Chunhui,born in 1979,Ph.D,professor.Her main research interests include statistical machine learning and data mining for industrial application.
  • Supported by:
    National Natural Science Foundation of China(62125306) and NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization(U1709211).

摘要: 视频异常检测需从整段视频中识别帧级别的异常行为。弱监督方法使用正常与异常视频,辅以视频级别标签训练模型,相比无监督视方法展现出了更优越的性能。然而,目前的弱监督视频异常检测方法无法记录视频长期模态,且部分方法为了获得更优的检测效果,利用了未来帧的信息,导致无法在线应用。为此,文中首次提出了一种基于双重动态记忆网络的弱监督视频异常检测方法,通过设计包含两个记忆模块的记忆网络来分别记录视频中长期的正常和异常模态。为了实现视频特征和记忆项的协同更新,采用读操作基于记忆模块中的记忆项对视频帧的特征进行增强,采用写操作基于视频帧特征对记忆项的内容进行更新,同时记忆项的数量在训练的过程中会动态调整从而适应不同视频监控场景的需求。在训练时,设计模态分离损失增加记忆项之间的区分度。在测试时,仅需要记忆项而不需要未来视频帧的参与,从而实现准确的在线检测。在两个公开的弱监督视频异常检测数据集上的实验结果表明,所提方法优于所有在线应用的方法,相比只能离线应用的方法也具有很强的竞争力。

关键词: 视频异常检测, 弱监督学习, 记忆网络, 多示例学习, 深度学习

Abstract: Video anomaly detection aims to identify frame-level abnormal behaviors from the video.The weakly supervised me-thods use both normal and abnormal video supplemented by the video-level labels for training,which show better performance than the unsupervised methods.However,the current weakly supervised video anomaly detection methods cannot record the long-term mode of the video.At the same time,some methods use the information of future frames to achieve better detection results,which makes it impossible to apply online.For this reason,a weakly supervised video anomaly detection method based on dual dynamic memory network is proposed for the first time in this paper.The memory network containing two memory modules is designed to record the normal and abnormal modes of video in the long term respectively.In order to realize the collaborative update of video features and memory items,the read operation is used to enhance the features of video frames based on the memory items in the memory module,and the write operation is used to update the contents of memory items based on the features of video frames.At the same time,the number of memory items will be dynamically adjusted during the training process to meet the needs of different video monitoring scenarios.In training,a modality separation loss is proposed to increase the discrimination between memory items.During the test,only memory items are needed without the participation of future video frames,so that accurate online detection can be achieved.Experimental results on two public weakly supervised video anomaly detection datasets show that the proposed method is superior to all online application methods,and also has strong competitiveness compared with offline application methods.

Key words: Video anomaly detection, Weakly supervised learning, Memory network, Multiple instance learning, Deep learning

中图分类号: 

  • TP183
[1]HUANG T,WU K J,WANG D C,et al.Video Anomaly Detection Based on Improved Time Segmentation Network[J].Computer Engineering,2022,48(11):137-144.
[2]FENG L J,ZHAO C H.Transfer increment for generalized zero-shot learning[J].IEEE Transactions on Neural Networks and Learning Systems,2020,32(6):2506-2520.
[3]FENG L J,ZHAO C H,LI X.Bias-Eliminated Semantic Refinement for Any-Shot Learning[J].IEEE Transactions on Image Processing,2022,31:2229-2244.
[4]储岳中,乔雨楠.多注意力结合光流的视频超分辨方法[J].重庆工商大学学报(自然科学版),2022,39(4):1-8.
[5]ZHAO Y,DENG B,SHEN C,et al.Spatio-temporal autoencoder for video anomaly detection[C]//Proceedings of the 25th ACM International Conference on Multimedia.2017:1933-1941.
[6]WANG X Z,CHE Z P,JIANG B,et al.Robust unsupervisedvideo anomaly detection by multipath frame prediction[J].ar-Xiv:2011.02763,2021.
[7]ZHOU W H,LI Y X,ZHAO C H.Object-Guided and Motion-Refined Attention Network for Video Anomaly Detection[C]//2022 IEEE International Conference on Multimedia and Expo(ICME).2022:1-6.
[8]LIU W,LUO W X,LIAN D Z,et al.Future frame prediction for anomaly detection-a new baseline[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6536-6545.
[9]YE M C,PENG X J,GAN W H,et al.Anopcn:Video anomaly detection via deep predictive coding network[C]//Proceedings of the 27th ACM International Conference on Multimedia.2019:1805-1813.
[10]CHAI Z,ZHAO C H,HUANG B.Multisource-refined transfer network for industrial fault diagnosis under domain and category inconsistencies[J].IEEE Transactions on Cybernetics,2021,52(9):9784-9796.
[11]SONG P Y,ZHAO C H.Slow down to go better:A survey on slow feature analysis[J].IEEE Transactions on Neural Networks and Learning Systems,Early Access.
[12]SULTANI W,CHEN C,SHAH M.Real-world anomaly detection in surveillance videos[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6479-6488.
[13]LIU W,LUO W X,LI Z X,et al.Margin Learning Embedded Prediction for Video Anomaly Detection with A Few Anomalies[C]//IJCAI.2019:3023-3030.
[14]ZHONG J X,LI N N,KONG W J,et al.Graph convolutional label noise cleaner:Train a plug-and-play action classifier for anomaly detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:1237-1246.
[15]PURWANTO D,CHEN Y T,FANG W H.Dance with self-attention:A new look of conditional random fields on anomaly detection in videos[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:173-183.
[16]TIAN Y,PANG G S,CHEN Y H,et al.Weakly-supervised vi-deo anomaly detection with robust temporal feature magnitude learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:4975-4986.
[17]GONG D,LIU L Q,LE V,et al.Memorizing normality to detect anomaly:Memory-augmented deep autoencoder for unsupervised anomaly detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:1705-1714.
[18]PARK H,NOH J,HAM B.Learning memory-guided normality for anomaly detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:14372-14381.
[19]LIU Z A,NIE Y W,LONG C J,et al.A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:13588-13597.
[20]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.2014:2672-2680.
[21]AKCAY S,ATAPOUR-ABARGHOUEI A,BRECKON T P.Ganomaly:Semi-supervised anomaly detection via adversarial training[C]//Asian Conference on Computer Vision.2018:622-637.
[22]HU H,GU J Y,ZHANG Z,et al.Relation networks for object detection[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2018:3588-3597.
[23]WU P,LIU J,SHI Y J,et al.Not only look,but also listen:Learning multimodal violence detection under weak supervision[C]//European Conference on Computer Vision.2020:322-339.
[24]ZHOU B,ANDONIAN A,OLIVA A,et al.Temporal relational reasoning in videos[C]//Proceedings of the European Confe-rence on Computer Vision(ECCV).2018:803-818.
[25]WESTON J,CHOPRA S,BORDES A.Memory networks[J].arXiv:1410.3916,2014.
[26]SUKHBAATAR S,WESTON J,FERGUS R.End-to-end me-mory networks[J].Advances in Neural Information Processing Systems,2015,28:1-9.
[27]CARREIRA J,ZISSERMAN A,QUO V.action recognition? a new model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308.
[28]GIRDHAR R,CARREIRA J,DOERSCH C,et al.Video action transformer network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:244-253.
[29]LUO W X,LIU W,GAO S H.A revisit of sparse coding based anomaly detection in stacked rnn framework[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:341-349.
[30]WAN B,FANG Y M,XIA X,et al.Weakly supervised videoanomaly detection via center-guided discriminative learning[C]//2020 IEEE International Conference on Multimedia and Expo(ICME).2020:1-6.
[31]FENG J C,HONG F T,ZHENG W S.Mist:Multiple instance self-training framework for video anomaly detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:14009-14018.
[32]ZHANG J G,QING L Y,MIAO J.Temporal convolutional network with complementary inner bag loss for weakly supervised anomaly detection[C]//2019 IEEE International Conference on Image Processing(ICIP).2019:4030-4034.
[33]ZHU Y,NEWSAM S.Motion-aware feature for improved video anomaly detection[J].arXiv:1907.10211,2019.
[34]VAN DER MAATEN L,HINTON G.Visualizing data using t-SNE[J].Journal of Machine Learning Research,2008,9(11):2579-2605.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!