Computer Science ›› 2023, Vol. 50 ›› Issue (9): 260-268.doi: 10.11896/jsjkx.230200167

• Artificial Intelligence • Previous Articles     Next Articles

Design of Visual Context-driven Interactive Bot System

LIU Yubo, GUO Bin, MA Ke, QIU Chen, LIU Sicong   

  1. School of Computer Science,Northwestern Polytechnical University,Xi'an 710129,China
  • Received:2023-02-22 Revised:2023-07-01 Online:2023-09-15 Published:2023-09-01
  • About author:LIU Yubo,born in 2000,postgraduate,is a member of China Computer Federation.His main researchinterest is multi-modal QA.
    GUO Bin,born in 1980,Ph.D,professor,doctoral supervisor.His main research interests include ubiquitous computing,mobile crowd sensing,big data intelligence and so on.
  • Supported by:
    National Science Fund for Distinguished Young Scholars(62025205) and National Natural Science Foundation of China(62032020,61725205,62102317).

Abstract: Bots are intelligent software that can interact with people,and usually have the characteristics of real-time and interactivity.This paper takes the bots driven by visual context awareness as the theme,and explores from four aspects:lightweight target detection model and compression,real-time key frame extraction,system optimization,and interaction strategy,and builds strong real-time on edge resource-constrained devices.A flexible,highly interactive and highly scalable bots system.Specifically,in terms of lightweight target detection models and compression,we first explore the performance and accuracy of different lightweight target detection models,and compress the SSD model based on the VGG16 network to find a suitable compression strategy.Compression on the latest SSD model can increase the frame rate by 187% compared with the original model,under the pre-mise that the accuracy loss does not exceed 0.1%.In terms of real-time key frame extraction,the input video stream is pre-screened to reduce system pressure,which is equivalent to reducing inference delay by 90%.In terms of system optimization,the use of microservices reduces the cold start delay by about 98%.In terms of interaction strategy,a state machine with timer is used to model the situation to achieve situation-driven,and the output of human-computer interaction is completed in the form of speech.

Key words: Resource-constrained, Lightweight model, Model compression, Object detection, Context-driven

CLC Number: 

  • TP391
[1]KLOPFENSTEIN L C,DELPRIORI S,MALATINI S,et al.The rise of bots:A survey of conversational interfaces,patterns,and paradigms[C]//Proceedings of the 2017 Conference on Designing Interactive Systems.2017:555-565.
[2]ALBAYRAK N,ÖZDEMIR A,ZEYDAN E.An overview of artificial intelligence based chatbots and an example chatbot application[C]//2018 26th Signal Processing and Communications Applications Conference(SIU).2018:1-4.
[3]ADARSH P,RATHI P,KUMAR M.YOLO v3-Tiny:ObjectDetection and Recognition using one stage improved model[C]//2020 6th International Conference on Advanced Computing and Communication Systems(ICACCS).2020:687-694.
[4]IANDOLA F N,HAN S,MOSKEWICZ M W,et al.Squeeze-Net:AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[J].arXiv:1602.07360,2016.
[5]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[6]ZHANG X,ZHOU X,LIN M,et al.Shufflenet:An extremelyefficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6848-6856.
[7]LI H,KADAV A,DURDANOVIC I,et al.Pruning filters for efficient convnets[J].arXiv:1608.08710,2016.
[8]VANHOUCKE V,SENIOR A,MAO M Z.Improving the speed of neural networks on CPUs[C]//Deep Learning and Unsupervised Feature Learning Workshop(NIPS 2011).2011.
[9]GUPTA S,AGRAWAL A,GOPALAKRISHNAN K,et al.Deep learning with limited numerical precision[C]//Interna-tional Conference on Machine Learning.2015:1737-1746.
[10]COURBARIAUX M,HUBARA I,SOUDRY D,et al.Binarized neural networks:Training deep neural networks with weights and activations constrained to +1 or -1[J].arXiv:1602.02830,2016.
[11]SUJATHA C,MUDENAGUDI U.A study on keyframe extraction methods for video summary[C]//2011 International Conference on Computational Intelligence and Communication Networks.2011:73-77.
[12]KELM P,SCHMIEDEKE S,SIKORA T.Feature-based videokey frame extraction for low quality video sequences[C]//2009 10th Workshop on Image Analysis for Multimedia Interactive Services.2009:25-28.
[13]LIU T,ZHANG H J,QI F.A novel video key-frame-extraction algorithm based on perceived motion energy model[J].IEEE Transactions on Circuits and Systems for Video Technology,2003,13(10):1006-1013.
[14]SOBEL I,FELDMAN G.A 3×3 isotropic gradient operator for image processing[J].Pattern Classification and Scene Analysis,1973:271-272.
[15] CANNY J.A Computational Approach to Edge Detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1986,PAMI-8(6):679-698.
[16]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[17]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2015:1-9.
[18]SANDLER M,HOWARD A,ZHU M,et al.Mobilenetv2:Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4510-4520.
[19]ZHUANG Y,RUI Y,HUANG T S,et al.Adaptive key frame extraction using unsupervised clustering[C]//Proceedings 1998 International Conference on Image Processing.icip98 (cat.no.98cb36269).1998:866-870.
[20]HARALICK R M,SHAPIRO L G.Image segmentation tech-niques[J].Computer Vision,Graphics,and Image Processing,1985,29(1):100-132.
[21]DALAL N,AND TRIGGS B.Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Confe-rence on Computer Vision and Pattern Recognition(CVPR'05).2005:886-893.
[1] YANG Yi, SHEN Sheng, DOU Zhiyang, LI Yuan, HAN Zhenjun. Tiny Person Detection for Intelligent Video Surveillance [J]. Computer Science, 2023, 50(9): 75-81.
[2] ZHU Ye, HAO Yingguang, WANG Hongyu. Deep Learning Based Salient Object Detection in Infrared Video [J]. Computer Science, 2023, 50(9): 227-234.
[3] WANG Xu, WU Yanxia, ZHANG Xue, HONG Ruize, LI Guangsheng. Survey of Rotating Object Detection Research in Computer Vision [J]. Computer Science, 2023, 50(8): 79-92.
[4] HUO Weile, JING Tao, REN Shuang. Review of 3D Object Detection for Autonomous Driving [J]. Computer Science, 2023, 50(7): 107-118.
[5] QI Xuanlong, CHEN Hongyang, ZHAO Wenbing, ZHAO Di, GAO Jingyang. Study on BGA Packaging Void Rate Detection Based on Active Learning and U-Net++ Segmentation [J]. Computer Science, 2023, 50(6A): 220200092-6.
[6] WANG Guogang, WU Yan, LIU Yibo. Target Detection Algorithm Based on Compound Scaling Deep Iterative CNN by RegressionConverging and Scaling Mixture [J]. Computer Science, 2023, 50(6A): 220500230-9.
[7] WU Liuchen, ZHANG Hui, LIU Jiaxuan, ZHAO Chenyang. Defect Detection of Transmission Line Bolt Based on Region Attention Mechanism andMulti-scale Feature Fusion [J]. Computer Science, 2023, 50(6A): 220200096-7.
[8] DOU Zhi, HU Chenguang, LIANG Jingyi, ZHENG Liming, LIU Guoqi. Lightweight Target Detection Algorithm Based on Improved Yolov4-tiny [J]. Computer Science, 2023, 50(6A): 220700006-7.
[9] JIA Tianhao, PENG Li. SSD Object Detection Algorithm with Residual Learning and Cyclic Attention [J]. Computer Science, 2023, 50(5): 170-176.
[10] WU Han, NIE Jiahao, ZHANG Zhaowei, HE Zhiwei, GAO Mingyu. Deep Learning-based Visual Multiple Object Tracking:A Review [J]. Computer Science, 2023, 50(4): 77-87.
[11] ZHANG Weiliang, CHEN Xiuhong. SSD Object Detection Algorithm with Cross-layer Fusion and Receptive Field Amplification [J]. Computer Science, 2023, 50(3): 231-237.
[12] CHEN Liang, WANG Lu, LI Shengchun, LIU Changhong. Study on Visual Dashboard Generation Technology Based on Deep Learning [J]. Computer Science, 2023, 50(3): 238-245.
[13] HUA Jie, LIU Xueliang, ZHAO Ye. Few-shot Object Detection Based on Feature Fusion [J]. Computer Science, 2023, 50(2): 209-213.
[14] SHANG Di, LYU Yanfeng, QIAO Hong. Incremental Object Detection Inspired by Memory Mechanisms in Brain [J]. Computer Science, 2023, 50(2): 267-274.
[15] CAI Xiao, CEHN Zhihua, SHENG Bin. SPT:Swin Pyramid Transformer for Object Detection of Remote Sensing [J]. Computer Science, 2023, 50(1): 105-113.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!