计算机科学 ›› 2023, Vol. 50 ›› Issue (9): 260-268.doi: 10.11896/jsjkx.230200167
刘宇博, 郭斌, 马可, 邱晨, 刘思聪
LIU Yubo, GUO Bin, MA Ke, QIU Chen, LIU Sicong
摘要: 虚拟机器人是能与人交互的智能软件,通常具有实时性、交互性等特点。文中以视觉情境感知驱动的虚拟机器人为主题,从轻量级目标检测模型及压缩、实时关键帧提取、系统优化和交互策略4个方面展开探究,在边缘的资源受限平台上构建强实时性、高交互性、高度可扩展的虚拟机器人系统。具体而言,在轻量级目标检测模型及压缩方面,首先探究不同主干网络下SSD模型的性能与精度,随后对基于VGG16网络的SSD模型进行int8量化与剪枝,在精度损失不超过0.1%的前提下,帧率比原模型提高187%。在实时关键帧提取方面,使用边缘特征强度和HOG特征进行视频流预筛选,降低系统压力,等效减少90%的推理时延。在系统优化方面,采用微服务化降低冷启动时延约98%。在交互策略方面,使用含计时器的状态机对情境进行建模以实现情境驱动,并采用语音形式完成人机交互的输出。
中图分类号:
[1]KLOPFENSTEIN L C,DELPRIORI S,MALATINI S,et al.The rise of bots:A survey of conversational interfaces,patterns,and paradigms[C]//Proceedings of the 2017 Conference on Designing Interactive Systems.2017:555-565. [2]ALBAYRAK N,ÖZDEMIR A,ZEYDAN E.An overview of artificial intelligence based chatbots and an example chatbot application[C]//2018 26th Signal Processing and Communications Applications Conference(SIU).2018:1-4. [3]ADARSH P,RATHI P,KUMAR M.YOLO v3-Tiny:ObjectDetection and Recognition using one stage improved model[C]//2020 6th International Conference on Advanced Computing and Communication Systems(ICACCS).2020:687-694. [4]IANDOLA F N,HAN S,MOSKEWICZ M W,et al.Squeeze-Net:AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[J].arXiv:1602.07360,2016. [5]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017. [6]ZHANG X,ZHOU X,LIN M,et al.Shufflenet:An extremelyefficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6848-6856. [7]LI H,KADAV A,DURDANOVIC I,et al.Pruning filters for efficient convnets[J].arXiv:1608.08710,2016. [8]VANHOUCKE V,SENIOR A,MAO M Z.Improving the speed of neural networks on CPUs[C]//Deep Learning and Unsupervised Feature Learning Workshop(NIPS 2011).2011. [9]GUPTA S,AGRAWAL A,GOPALAKRISHNAN K,et al.Deep learning with limited numerical precision[C]//Interna-tional Conference on Machine Learning.2015:1737-1746. [10]COURBARIAUX M,HUBARA I,SOUDRY D,et al.Binarized neural networks:Training deep neural networks with weights and activations constrained to +1 or -1[J].arXiv:1602.02830,2016. [11]SUJATHA C,MUDENAGUDI U.A study on keyframe extraction methods for video summary[C]//2011 International Conference on Computational Intelligence and Communication Networks.2011:73-77. [12]KELM P,SCHMIEDEKE S,SIKORA T.Feature-based videokey frame extraction for low quality video sequences[C]//2009 10th Workshop on Image Analysis for Multimedia Interactive Services.2009:25-28. [13]LIU T,ZHANG H J,QI F.A novel video key-frame-extraction algorithm based on perceived motion energy model[J].IEEE Transactions on Circuits and Systems for Video Technology,2003,13(10):1006-1013. [14]SOBEL I,FELDMAN G.A 3×3 isotropic gradient operator for image processing[J].Pattern Classification and Scene Analysis,1973:271-272. [15] CANNY J.A Computational Approach to Edge Detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1986,PAMI-8(6):679-698. [16]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [17]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2015:1-9. [18]SANDLER M,HOWARD A,ZHU M,et al.Mobilenetv2:Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4510-4520. [19]ZHUANG Y,RUI Y,HUANG T S,et al.Adaptive key frame extraction using unsupervised clustering[C]//Proceedings 1998 International Conference on Image Processing.icip98 (cat.no.98cb36269).1998:866-870. [20]HARALICK R M,SHAPIRO L G.Image segmentation tech-niques[J].Computer Vision,Graphics,and Image Processing,1985,29(1):100-132. [21]DALAL N,AND TRIGGS B.Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Confe-rence on Computer Vision and Pattern Recognition(CVPR'05).2005:886-893. |
|