计算机科学 ›› 2025, Vol. 52 ›› Issue (9): 259-268.doi: 10.11896/jsjkx.240400143

• 计算机图形学&多媒体 • 上一篇    下一篇

基于雷达和视觉融合的多模态空中手写体识别

刘威, 许勇, 方娟, 李城, 祝玉军, 方群, 何昕   

  1. 安徽师范大学计算机与信息学院 安徽 芜湖 241002
  • 收稿日期:2024-04-22 修回日期:2024-10-23 出版日期:2025-09-15 发布日期:2025-09-11
  • 通讯作者: 许勇(yxull@ahnu.edu.cn)
  • 作者简介:(2221012420@ahnu.edu.cn)
  • 基金资助:
    国家自然科学基金(62072004)

Multimodal Air-writing Gesture Recognition Based on Radar-Vision Fusion

LIU Wei, XU Yong, FANG Juan, LI Cheng, ZHU Yujun, FANG Qun, HE Xin   

  1. School of Computer and Information,Anhui Normal University,Wuhu,Anhui 241002,China
  • Received:2024-04-22 Revised:2024-10-23 Online:2025-09-15 Published:2025-09-11
  • About author:LIU Wei,born in 2000,postgraduate.His main research interests include wireless intelligent sensing and deep learning.
    XU Yong,born in 1966,Ph.D,professor,master supervisor.His main research interests include computer network security,IoT security,wireless intelligent sensing and deep learning.
  • Supported by:
    National Natural Science Foundation of China(62072004).

摘要: 空中手写体识别是一项前景广阔的人机交互技术。单一传感器挖掘手势特征,如毫米波雷达、相机和Wi-Fi,均难以捕捉完整的手势特征。对此,设计了一种灵活的双流融合网络(Two-Stream Fusion Networks,TFNet)模型。该模型既可以融合空中手写体能量图(Air-writing Energy Images,AEIs)和点云时间序列特征图(Point Cloud Temporal Feature Maps,PTFMs),又能仅以单模态数据作为网络的输入。同时,构建了一种鲁棒可靠的多模态空中手写体识别系统。该系统采用硬触发方式启动和结束多传感器数据采集,分别处理同时间序列内的图像和点云数据,生成AEIs和PTFMs,实现多模态数据时间对齐。经过分支网络,对手势外观和细粒度运动信息进行特征提取,结合自适应加权权重,融合双分支决策结果,避免了多模态中间特征的复杂交互,有效地降低了模型的损失。采集多名实验者空中书写0-9共10个数字的空中手写体数据对模型进行评估,结果表明,所提模型在识别精度方面优于其他基线模型,且具有较强的鲁棒性,在空中手写体识别任务中表现出明显优势,可成为多传感器在空中手写体识别任务中的有效工具。

关键词: 毫米波雷达, 计算机视觉, 深度学习, 多模态融合, 空中手写体识别

Abstract: Air-writing gesture recognition is a promising technology for human-computer interaction.Extracting gesture features with a single sensor,such as mmWave radar,camera,or Wi-Fi,fails to capture the complete gesture characteristics.A flexible Two-Stream Fusion Networks(TFNet) model is designed,capable of fusing Air-writing Energy Images(AEIs) and Point Cloud Temporal Feature Maps(PTFMs),as well as operating with unimodal data input.A robust and reliable multimodal air-writing gesture recognition system is constructed.This system utilizes a hard trigger to start and end multi-sensor data acquisition,processing image and point cloud data within the same time sequence to generate AEIs and PTFMs,achieving temporal alignment of multimodal data.Branch networks are employed to extract features of gesture appearance and fine-grained motion information.Adaptive weighted fusion of the dual-stream decision results is used,avoiding the complex interactions of intermediate multimodal features and effectively reducing model loss.Data of ten air-writing gestures representing digits 0-9 are collected from multiple participants to evaluate the model.The results indicate that the proposed model outperforms other baseline models in recognition accuracy and demonstrates strong robustness.The model shows significant advantages in air-writing gesture recognition tasks,making it an effective tool for multi-sensor air-writing gesture recognition.

Key words: mmWave radar, Computer vision, Deep learning, Multimodal fusion, Air-writing gesture recognition

中图分类号: 

  • TP391
[1]KÖPÜKLÜ O,LEDWON T,RONG Y,et al.Drivermhg:Amulti-modal dataset for dynamic recognition of driver micro hand gestures and a real-time recognition framework[C]//2020 15th IEEE International Conference on Automatic Face and Gesture Recognition(FG 2020).IEEE,2020:77-84.
[2]SHARMA S,SINGH S.Vision-based hand gesture recognition using deep learning for the interpretation of sign language[J].Expert Systems with Applications,2021,182:115657.
[3]ZHOU L Y,ZHANG J H,YUAN T T,et al.Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion[J].Computer Science,2022,49(9):155-161.
[4]LIU H,ZHOU A,DONG Z,et al.M-gesture:Person-independent real-time in-air gesture recognition using commodity millimeter wave radar[J].IEEE Internet of Things Journal,2021,9(5):3397-3415.
[5]MAHMOUD N M,FOUAD H,SOLIMAN A M.Smart healthcare solutions using the internet of medical things for hand gesture recognition system[J].Complex & Intelligent Systems,2021,7:1253-1264.
[6]LIU H,LIU Z.A multimodal dynamic hand gesture recognition based on radar-vision fusion[J].IEEE Transactions on Instrumentation and Measurement,2023,72:1-15.
[7]TANG X,YAN Z,PENG J,et al.Selective spatiotemporal features learning for dynamic gesture recognition[J].Expert Systems with Applications,2021,169:114499.
[8]WATANABE T,MANIRUZZAMAN,HASAN M A,et al.2D Camera-based air-writing recognition using hand pose estimation and hybrid deep learning model[J].Electronics,2023,12(4):995-1009.
[9]QI J,MA L,CUI Z,et al.Computer vision-based hand gesture recognition for human-robot interaction:a review[J].Complex &Intelligent Systems,2024,10(1):1581-1606.
[10]LIN C,AHMAD A,QU R,et al.A handwriting recognition system with wifi[J].IEEE Transactions on Mobile Computing,2023,23(4):3391-3409.
[11]GUO Z,XIAO F,SHENG B,et al.WiReader:Adaptive airhandwriting recognition based on commercial WiFi signal[J].IEEE Internet of Things Journal,2020,7(10):10483-10494.
[12]AHMED S,KIM W,PARK J,et al.Radar-based air-writing gesture recognition using a novel multistream CNN approach[J].IEEE Internet of Things Journal,2022,9(23):23869-23880.
[13]SALAMI D,HASIBI R,PALIPANA S,et al.Tesla-rapture:A lightweight gesture recognition system from mmwave radar sparse point clouds[J].IEEE Transactions on Mobile Computing,2022,22(8):4946-4960.
[14]YAN X Y,LU F F,GE L S,et al.Image Style Transfer Based on the Distribution Matching of the Style Features[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2023,40(3):48-55.
[15]SHI Y,DU L,CHEN X,et al.Robust gait recognition based on deep CNNs with camera and radar sensor fusion[J].IEEE Internet of Things Journal,2023,10(12):10817-10832.
[16]CHEN Y S,CHENG K H.BiCLR:Radar-Camera-based Cross-Modal Bi-Contrastive Learning for Human Motion Recognition[J].IEEE Sensors Journal,2024,24(3):4102-4119.
[17]SINGH A D,SANDHA S S,GARCIA L,et al.Radhar:Human activity recognition from point clouds generated through a millimeter-wave radar[C]//Proceedings of the 3rd ACM Workshop on Millimeter-wave Networks and Sensing Systems.2019:51-56.
[18]YAN B,WANG P,DU L,et al.mmGesture:Semi-supervisedgesture recognition system using mmWave radar[J].Expert Systems with Applications,2023,213:119042.
[19]ZHAO P,LU C X,WANG B,et al.Cubelearn:End-to-end lear-ning for human motion recognition from raw mmwave radar signals[J].IEEE Internet of Things Journal,2023,10(12):10236-10249.
[20]SAHOO J P,PRAKASH A J,PĹAWIAK P,et al.Real-time hand gesture recognition using fine-tuned convolutional neural network[J].Sensors,2022,22(3):706-720.
[21]RASTGOO R,KIANI K,ESCALERA S.Real-time isolatedhand sign language recognition using deep networks and SVD[J].Journal of Ambient Intelligence and Humanized Computing,2022,13(1):591-611.
[22]SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos[J].Advances in Neural Information Processing Systems,2014,27:568-576.
[23]FEICHTENHOFER C,PINZ A,ZISSERMAN A.Convolutional two-stream network fusion for video action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1933-1941.
[24]GUO X,DU J,GAO J,et al.Pedestrian detection based on fusion of millimeter wave radar and vision[C]//Proceedings of the 2018 International Conference on Artificial Intelligence and Pattern Recognition.2018:38-42.
[25]NOBIS F,GEISSLINGER M,WEBER M,et al.A deep learning-based radar and camera sensor fusion architecture for object detection[C]//2019 Sensor Data Fusion:Trends,Solutions,Applications(SDF).IEEE,2019:1-7.
[26]CHADWICK S,MADDERN W,NEWMAN P.Distant vehicledetection using radar and vision[C]//2019 International Conference on Robotics and Automation(ICRA).IEEE,2019:8311-8317.
[27]WANG Q,WU B,ZHU P,et al.ECA-Net:Efficient channel attention for deep convolutional neural networks[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:11534-11542.
[28]GROBELNY P,NARBUDOWICZ A.Hand gestures recordedwith mm-Wave FMCW radar(AWR1642)[DB/OL].(2021-06-03)[2024-06-02].https://dx.doi.org/10.21227/wh5w-c362.
[29]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[30]HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely con-nected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[31]QI C R,SU H,MO K,et al.Pointnet:Deep learning on pointsets for 3d classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:652-660.
[32]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[33]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2818-2826.
[34]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[35]XU L,ZHANG K,YANG G,et al.Gesture recognition usingdual-stream CNN based on fusion of sEMG energy kernel phase portrait and IMU amplitude image[J].Biomedical Signal Processing and Control,2022,73:103364.
[36]CHEN J C,LEE C Y,HUANG P Y,et al.Driver behavior analy-sis via two-stream deep convolutional neural network[J].Applied Sciences,2020,10(6):1908-1922.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!