面向资源受限边缘设备的实时精确目标跟踪

doi:10.11896/jsjkx.231200167

摘要/Abstract

摘要： 实时视频分析任务通常涉及到运行计算密集型的深度神经网络模型来实现目标跟踪。在实际应用中,将多路视频数据分析任务卸载到摄像机附近的边缘设备上进行处理变得尤为重要。然而,这些边缘设备的计算资源通常非常有限,导致目标跟踪的精度较差。这主要是由过时的检测结果、跟踪错误积累以及无法感知新目标造成的。针对上述问题,提出了一种基于预测和修正的检测跟踪框架。该框架中包含了3个核心的组件:1)预测性检测传播:通过轻量级预测模型快速更新过时的对象边界框以匹配当前帧;2)帧差修正器:基于帧差信息将出现误差的目标框回归到正确位置;3)新目标检测器:在跟踪过程中通过对帧差特征进行聚类发现新出现的目标。实验结果表明,相比基线方法,该框架在不同的交通场景中取得了19.4%到34.7%的精度提升,同时保持了实时的运行速度。

关键词: 边缘设备, 资源效率, 目标检测, 目标跟踪

Abstract: Real-time video analysis tasks often involve running computationally intensive deep neural network(DNN) models for object tracking.In practical applications,offloading multi-stream video analysis tasks to edge devices near the cameras has become crucial.However,these edge devices often have limited computing resources,resulting in low tracking accuracy.This is primarily due to outdated detection results,accumulated tracking errors,and the inability to detect new object.To address these issues,a prediction-correction based framework is proposed.The framework comprises three core components:1)Predictive detection propagation,which rapidly updates outdated object bounding boxes using a lightweight prediction model to match the current frame.2)Frame difference corrector,which refines bounding boxes based on frame difference information.3)New object detector,which discovers newly appearing objects during the tracking process by clustering frame difference features.Experimental results demonstrate that the framework achieves accuracy improvements ranging from 19.4% to 34.7% compared to baseline methods across various traffic scenarios while maintaining real-time execution speed.

Key words: Edge device, Resource efficiency, Object detection, Object tracking

中图分类号:

TP391.4

张莘沂, 谭光. 面向资源受限边缘设备的实时精确目标跟踪[J]. 计算机科学, 2024, 51(11A): 231200167-9. https://doi.org/10.11896/jsjkx.231200167

ZHANG Xinyi, TAN Guang. Real-time Accurate Object Tracking for Resource-constrained Edge Devices[J]. Computer Science, 2024, 51(11A): 231200167-9. https://doi.org/10.11896/jsjkx.231200167

参考文献

[1]CHEN J,WANG Q,CHENG H H,et al.A review of vision-based traffic semantic understanding in ITSs[J].IEEE Transactions on Intelligent Transportation Systems,2022.
[2]YI J,CHOI S,LEE Y.EagleEye:Wearable camera-based person identification in crowded urban spaces[C]//Proceedings of the 26th Annual International Conference on Mobile Computing and Networking.2020:1-14
[3]EMAMI P,ELEFTERIADOU L,RANKA S.Long-range multi-object tracking at traffic intersections on low-power devices[J].IEEE Transactions on Intelligent Transportation Systems,2021,23(3):2482-2493.
[4]LIU L,LI H,GRUTESER M.Edge assisted real-time object detection for mobile augmented reality[C]//The 25th Annual International Conference on Mobile Computing and Networking.2019:1-16.
[5]BHARDWAJ R,XIA Z,ANANTHANARAYANAN G,et al.Ekya:Continuous learning of video analytics models on edge compute servers[C]//19th USENIX Symposium on Networked Systems Design and Implementation(NSDI 22).2022:119-135.
[6]LI Y,PADMANABHAN A,ZHAO P,et al.Reducto:On-camera filtering for resource-efficient real-time video analytics[C]//Proceedings of the Annual Conference of the ACM Special Inte-rest Group on Data Communication on the Applications,Techno-logies,Architectures,and Protocols for Computer Communication.2020:359-376.
[7]YANG K,YI J,LEE K,et al.FlexPatch:Fast and Accurate Ob-ject Detection for On-device High-Resolution Live Video Analytics[C]//IEEE INFOCOM 2022-IEEE Conference on Computer Communications.IEEE,2022:1898-1907.
[8]CHEN T Y H,RAVINDRANATH L,DENG S,et al.Glimpse:Continuous,real-time object recognition on mobile devices[C]//Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems.2015:155-168.
[9]APICHARTTRISORN K,RAN X,CHEN J,et al.Frugal following:Power thrifty object detection and tracking for mobile augmented reality[C]//Proceedings of the 17th Conference on Embedded Networked Sensor Systems.2019:96-109.
[10]YOLOv5[OL].https://github.com/ultralytics/yolov5.
[11]KANG D,EMMONS J,ABUZAID F,et al.Noscope:optimizing neural network queries over video at scale[C]//Proceedings of the VLDB Endowment,2017:1586-1597.
[12]ZHANG S,WANG C,JIN Y,et al.Adaptive configuration selection and bandwidth allocation for edge-based video analytics[J].IEEE/ACM Transactions on Networking,2021,30(1):285-298.
[13]TCHAYE-KONDI J,ZHAI Y,SHEN J,et al.Smartfilter:Anedge system for real-time application-guided video frames filtering[J].IEEE Internet of Things Journal,2022,9(23):23772-23785.
[14]MOLL O,BASTANI F,MADDEN S,et al.Exsample:Efficient searches on video repositories through adaptive sampling[C]//2022 IEEE 38th International Conference on Data Engineering(ICDE).IEEE,2022:2956-2968.
[15]XU R,ZHANG C,WANG P,et al.ApproxDet:content andcontention-aware approximate object detection for mobiles[C]//Proceedings of the 18th Conference on Embedded Networked Sensor Systems.2020:449-462.
[16]CAO J,HADIDI R,ARULRAJ J,et al.Thia:Accelerating video analytics using early inference and fine-grained query planning[J].arXiv:2102.08481,2021.
[17]BASTANI F,MADDEN S.OTIF:Efficient tracker pre-processing over large video datasets[C]//Proceedings of the 2022 International Conference on Management of Data.2022:2091-2104.
[18]HWANG J,KIM M,KIM D,et al.{CoVA}:Exploiting {Com-pressed-Domain} Analysis to Accelerate Video Analytics[C]//2022 USENIX Annual Technical Conference(USENIX ATC 22).2022:707-722.
[19]ZHANG H,ANANTHANARAYANAN G,BODIK P,et al.Live video analytics at scale with approximation and {Delay-Tolerance}[C]//14th USENIX Symposium on Networked Systems Design and Implementation(NSDI 17).2017:377-392.
[20]HUNG C C,ANANTHANARAYANAN G,BODIK P,et al.Videoedge:Processing camera streams using hierarchical clusters[C]//2018 IEEE/ACM Symposium on Edge Computing(SEC).IEEE,2018:115-131.
[21]ZHANG M,WANG F,LIU J.Casva:Configuration-adaptivestreaming for live video analytics[C]//IEEE INFOCOM 2022-IEEE Conference on Computer Communications.IEEE,2022:2168-2177.
[22]YUAN T,MI L,WANG W,et al.AccDecoder:Accelerated Decoding for Neural-enhanced Video Analytics[C]//IEEE INFOCOM 2023-IEEE Conference on Computer Communications.2023:1-10.
[23]YI S,HAO Z,ZHANG Q,et al.Lavea:Latency-aware video analytics on edge computing platform[C]//Proceedings of the Second ACM/IEEE Symposium on Edge Computing.2017:1-13.
[24]CHENG L,WANG J,LI Y.Vitrack:Efficient tracking on theedge for commodity video surveillance systems[J].IEEE Transactions on Parallel and Distributed Systems,2021,33(3):723-735.
[25]ZHANG W,HE Z,LIU L,et al.Elf:accelerate high-resolution mobile deep vision with content-aware parallel offloading[C]//Proceedings of the 27th Annual International Conference on Mobile Computing and Networking.2021:201-214.
[26]LIU S,WANG T,LI J,et al.Adamask:Enabling machine-centric video streaming with adaptive frame masking for dnn infe-rence offloading[C]//Proceedings of the 30th ACM International Conference on Multimedia.2022:3035-3044.
[27]KONG Y,YANG P,CHENG Y.Edge-assisted on-device model update for video analytics in adverse environments[C]//Proceedings of the 31st ACM International Conference on Multimedia.2023:9051-9060.
[28]DONG X,SHEN J,WANG W,et al.Dynamical hyperparameter optimization via deep reinforcement learning in tracking[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43(5):1515-1529.
[29]DONG X,SHEN J,PORIKLI F,et al.Adaptive siamese tracking with a compact latent network[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(7):8049-8062.
[30]DEO N,WOLFF E,BEIJBOM O.Multimodal trajectory prediction conditioned on lane-graph traversals[C]//Conference on Robot Learning.PMLR,2022:203-212.
[31]CHOI D,MIN K W.Hierarchical latent structure for multi-modal vehicle trajectory forecasting[C]//European Conference on Computer Vision.Cham:Springer Nature Switzerland,2022:129-145.
[32] CORSEL C W,VAN LIER M,KAMPMEIJER L,et al.Exploiting Temporal Context for Tiny Object Detection[C]//Procee-dings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2023:79-89.
[33]LV X,WANG Q,YU C,et al.A Feedback-Driven DNN Infe-rence Acceleration System for Edge-Assisted Video Analytics[J].IEEE Transactions on Computers,2023,72(10):2902-2912.
[34]GUPTA A,JOHNSON J,FEI-FEI L,et al.Social gan:Socially acceptable trajectories with generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2255-2264.
[35] ALAHI A,GOEL K,RAMANATHAN V,et al.Social lstm:Human trajectory prediction in crowded spaces[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:961-971.
[36] ORGHIDAN R,SALVI J,GORDAN M,et al.Camera calibration using two or three vanishing points[C]//2012 Federated Conference on Computer Science and Information Systems(FedCSIS).IEEE,2012:123-130.
[37]TaNG Z,WANG G,XIAO H,et al.Single-camera and inter-camera vehicle tracking and 3D speed estimation based on fusion of visual and semantic features[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2018:108-115.
[38]CHIARA L F,COSCIA P,DAS S,et al.Goal-driven self-attentive recurrent networks for trajectory prediction[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:2518-2527.
[39]ZHANG K,FENG X,WU L,et al.Trajectory prediction for autonomous driving using spatial-temporal graph attention transformer[J].IEEE Transactions on Intelligent Transportation Systems,2022,23(11):22343-22353.
[40]LEE N,CHOI W,VERNAZA P,et al.Desire:Distant futureprediction in dynamic scenes with interacting agents[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:336-345.
[41]SINGLA N.Motion detection based on frame difference method[J].International Journal of Information & Computation Technology,2014,4(15):1559-1565.
[42]ZHENG D,ZHANG Y,XIAO Z.Deep learning-driven gaussian modeling and improved motion detection algorithm of the three-frame difference method[J].Mobile Information Systems,2021,2021(1):9976623:1-9976623:7.
[43]SCHUBERT E,SANDER J,ESTER M,et al.DBSCAN revisi-ted,revisited:why and how you should(still) use DBSCAN[J].ACM Transactions on Database Systems(TODS),2017,42(3):1-21.
[44]DU K,PERVAIZ A,YUAN X,et al.Server-driven videostreaming for deep learning inference[C]//Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications,Technologies,Architectures,and Protocols for Computer Communication.2020:557-570.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed