基于雷达和视觉融合的多模态空中手写体识别

doi:10.11896/jsjkx.240400143

Abstract

Abstract: Air-writing gesture recognition is a promising technology for human-computer interaction.Extracting gesture features with a single sensor,such as mmWave radar,camera,or Wi-Fi,fails to capture the complete gesture characteristics.A flexible Two-Stream Fusion Networks(TFNet) model is designed,capable of fusing Air-writing Energy Images(AEIs) and Point Cloud Temporal Feature Maps(PTFMs),as well as operating with unimodal data input.A robust and reliable multimodal air-writing gesture recognition system is constructed.This system utilizes a hard trigger to start and end multi-sensor data acquisition,processing image and point cloud data within the same time sequence to generate AEIs and PTFMs,achieving temporal alignment of multimodal data.Branch networks are employed to extract features of gesture appearance and fine-grained motion information.Adaptive weighted fusion of the dual-stream decision results is used,avoiding the complex interactions of intermediate multimodal features and effectively reducing model loss.Data of ten air-writing gestures representing digits 0－9 are collected from multiple participants to evaluate the model.The results indicate that the proposed model outperforms other baseline models in recognition accuracy and demonstrates strong robustness.The model shows significant advantages in air-writing gesture recognition tasks,making it an effective tool for multi-sensor air-writing gesture recognition.

Key words: mmWave radar, Computer vision, Deep learning, Multimodal fusion, Air-writing gesture recognition

CLC Number:

TP391

LIU Wei, XU Yong, FANG Juan, LI Cheng, ZHU Yujun, FANG Qun, HE Xin. Multimodal Air-writing Gesture Recognition Based on Radar-Vision Fusion[J].Computer Science, 2025, 52(9): 259-268.

References

[1]KÖPÜKLÜ O,LEDWON T,RONG Y,et al.Drivermhg:Amulti-modal dataset for dynamic recognition of driver micro hand gestures and a real-time recognition framework[C]//2020 15th IEEE International Conference on Automatic Face and Gesture Recognition(FG 2020).IEEE,2020:77-84.
[2]SHARMA S,SINGH S.Vision-based hand gesture recognition using deep learning for the interpretation of sign language[J].Expert Systems with Applications,2021,182:115657.
[3]ZHOU L Y,ZHANG J H,YUAN T T,et al.Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion[J].Computer Science,2022,49(9):155-161.
[4]LIU H,ZHOU A,DONG Z,et al.M-gesture:Person-independent real-time in-air gesture recognition using commodity millimeter wave radar[J].IEEE Internet of Things Journal,2021,9(5):3397-3415.
[5]MAHMOUD N M,FOUAD H,SOLIMAN A M.Smart healthcare solutions using the internet of medical things for hand gesture recognition system[J].Complex & Intelligent Systems,2021,7:1253-1264.
[6]LIU H,LIU Z.A multimodal dynamic hand gesture recognition based on radar-vision fusion[J].IEEE Transactions on Instrumentation and Measurement,2023,72:1-15.
[7]TANG X,YAN Z,PENG J,et al.Selective spatiotemporal features learning for dynamic gesture recognition[J].Expert Systems with Applications,2021,169:114499.
[8]WATANABE T,MANIRUZZAMAN,HASAN M A,et al.2D Camera-based air-writing recognition using hand pose estimation and hybrid deep learning model[J].Electronics,2023,12(4):995-1009.
[9]QI J,MA L,CUI Z,et al.Computer vision-based hand gesture recognition for human-robot interaction:a review[J].Complex &Intelligent Systems,2024,10(1):1581-1606.
[10]LIN C,AHMAD A,QU R,et al.A handwriting recognition system with wifi[J].IEEE Transactions on Mobile Computing,2023,23(4):3391-3409.
[11]GUO Z,XIAO F,SHENG B,et al.WiReader:Adaptive airhandwriting recognition based on commercial WiFi signal[J].IEEE Internet of Things Journal,2020,7(10):10483-10494.
[12]AHMED S,KIM W,PARK J,et al.Radar-based air-writing gesture recognition using a novel multistream CNN approach[J].IEEE Internet of Things Journal,2022,9(23):23869-23880.
[13]SALAMI D,HASIBI R,PALIPANA S,et al.Tesla-rapture:A lightweight gesture recognition system from mmwave radar sparse point clouds[J].IEEE Transactions on Mobile Computing,2022,22(8):4946-4960.
[14]YAN X Y,LU F F,GE L S,et al.Image Style Transfer Based on the Distribution Matching of the Style Features[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2023,40(3):48-55.
[15]SHI Y,DU L,CHEN X,et al.Robust gait recognition based on deep CNNs with camera and radar sensor fusion[J].IEEE Internet of Things Journal,2023,10(12):10817-10832.
[16]CHEN Y S,CHENG K H.BiCLR:Radar-Camera-based Cross-Modal Bi-Contrastive Learning for Human Motion Recognition[J].IEEE Sensors Journal,2024,24(3):4102-4119.
[17]SINGH A D,SANDHA S S,GARCIA L,et al.Radhar:Human activity recognition from point clouds generated through a millimeter-wave radar[C]//Proceedings of the 3rd ACM Workshop on Millimeter-wave Networks and Sensing Systems.2019:51-56.
[18]YAN B,WANG P,DU L,et al.mmGesture:Semi-supervisedgesture recognition system using mmWave radar[J].Expert Systems with Applications,2023,213:119042.
[19]ZHAO P,LU C X,WANG B,et al.Cubelearn:End-to-end lear-ning for human motion recognition from raw mmwave radar signals[J].IEEE Internet of Things Journal,2023,10(12):10236-10249.
[20]SAHOO J P,PRAKASH A J,PĹAWIAK P,et al.Real-time hand gesture recognition using fine-tuned convolutional neural network[J].Sensors,2022,22(3):706-720.
[21]RASTGOO R,KIANI K,ESCALERA S.Real-time isolatedhand sign language recognition using deep networks and SVD[J].Journal of Ambient Intelligence and Humanized Computing,2022,13(1):591-611.
[22]SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos[J].Advances in Neural Information Processing Systems,2014,27:568-576.
[23]FEICHTENHOFER C,PINZ A,ZISSERMAN A.Convolutional two-stream network fusion for video action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1933-1941.
[24]GUO X,DU J,GAO J,et al.Pedestrian detection based on fusion of millimeter wave radar and vision[C]//Proceedings of the 2018 International Conference on Artificial Intelligence and Pattern Recognition.2018:38-42.
[25]NOBIS F,GEISSLINGER M,WEBER M,et al.A deep learning-based radar and camera sensor fusion architecture for object detection[C]//2019 Sensor Data Fusion:Trends,Solutions,Applications(SDF).IEEE,2019:1-7.
[26]CHADWICK S,MADDERN W,NEWMAN P.Distant vehicledetection using radar and vision[C]//2019 International Conference on Robotics and Automation(ICRA).IEEE,2019:8311-8317.
[27]WANG Q,WU B,ZHU P,et al.ECA-Net:Efficient channel attention for deep convolutional neural networks[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:11534-11542.
[28]GROBELNY P,NARBUDOWICZ A.Hand gestures recordedwith mm-Wave FMCW radar(AWR1642)[DB/OL].(2021-06-03)[2024-06-02].https://dx.doi.org/10.21227/wh5w-c362.
[29]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[30]HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely con-nected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[31]QI C R,SU H,MO K,et al.Pointnet:Deep learning on pointsets for 3d classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:652-660.
[32]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[33]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2818-2826.
[34]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[35]XU L,ZHANG K,YANG G,et al.Gesture recognition usingdual-stream CNN based on fusion of sEMG energy kernel phase portrait and IMU amplitude image[J].Biomedical Signal Processing and Control,2022,73:103364.
[36]CHEN J C,LEE C Y,HUANG P Y,et al.Driver behavior analy-sis via two-stream deep convolutional neural network[J].Applied Sciences,2020,10(6):1908-1922.

Related Articles 15

[1]	YIN Shi, SHI Zhenyang, WU Menglin, CAI Jinyan, YU De. Deep Learning-based Kidney Segmentation in Ultrasound Imaging:Current Trends and Challenges [J]. Computer Science, 2025, 52(9): 16-24.
[2]	ZENG Lili, XIA Jianan, LI Shaowen, JING Maike, ZHAO Huihui, ZHOU Xuezhong. M2T-Net:Cross-task Transfer Learning Tongue Diagnosis Method Based on Multi-source Data [J]. Computer Science, 2025, 52(9): 47-53.
[3]	LI Yaru, WANG Qianqian, CHE Chao, ZHU Deheng. Graph-based Compound-Protein Interaction Prediction with Drug Substructures and Protein 3D Information [J]. Computer Science, 2025, 52(9): 71-79.
[4]	LUO Chi, LU Lingyun, LIU Fei. Partial Differential Equation Solving Method Based on Locally Enhanced Fourier NeuralOperators [J]. Computer Science, 2025, 52(9): 144-151.
[5]	LIU Leyuan, CHEN Gege, WU Wei, WANG Yong, ZHOU Fan. Survey of Data Classification and Grading Studies [J]. Computer Science, 2025, 52(9): 195-211.
[6]	GAO Long, LI Yang, WANG Suge. Sentiment Classification Method Based on Stepwise Cooperative Fusion Representation [J]. Computer Science, 2025, 52(9): 313-319.
[7]	LIU Zhengyu, ZHANG Fan, QI Xiaofeng, GAO Yanzhao, SONG Yijing, FAN Wang. Review of Research on Deep Learning Compiler [J]. Computer Science, 2025, 52(8): 29-44.
[8]	TANG Boyuan, LI Qi. Review on Application of Spatial-Temporal Graph Neural Network in PM_2.5 ConcentrationForecasting [J]. Computer Science, 2025, 52(8): 71-85.
[9]	ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225.
[10]	ZHOU Lei, SHI Huaifeng, YANG Kai, WANG Rui, LIU Chaofan. Intelligent Prediction of Network Traffic Based on Large Language Model [J]. Computer Science, 2025, 52(6A): 241100058-7.
[11]	GUAN Xin, YANG Xueyong, YANG Xiaolin, MENG Xiangfu. Tumor Mutation Prediction Model of Lung Adenocarcinoma Based on Pathological [J]. Computer Science, 2025, 52(6A): 240700010-8.
[12]	TAN Jiahui, WEN Chenyan, HUANG Wei, HU Kai. CT Image Segmentation of Intracranial Hemorrhage Based on ESC-TransUNet Network [J]. Computer Science, 2025, 52(6A): 240700030-9.
[13]	RAN Qin, RUAN Xiaoli, XU Jing, LI Shaobo, HU Bingqi. Function Prediction of Therapeutic Peptides with Multi-coded Neural Networks Based on Projected Gradient Descent [J]. Computer Science, 2025, 52(6A): 240800024-6.
[14]	FAN Xing, ZHOU Xiaohang, ZHANG Ning. Review on Methods and Applications of Short Text Similarity Measurement in Social Media Platforms [J]. Computer Science, 2025, 52(6A): 240400206-8.
[15]	SU Zhiyuan, ZHAO Lixu, HAO Zhiheng, BAI Rufeng. Suvery of Artificial Intelligence Ensuring eVTOL Flight Safety in the Context of Low-altitudeEconomy [J]. Computer Science, 2025, 52(6A): 250200050-13.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Multimodal Air-writing Gesture Recognition Based on Radar-Vision Fusion

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0