计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 193-208.doi: 10.11896/jsjkx.230200205

• 计算机图形学&多媒体 • 上一篇    下一篇

基于视觉的神经网络三维动态手势识别方法综述

王瑞平1,2, 吴士泓2, 张美航3, 王小平1   

  1. 1 华中科技大学人工智能与自动化学院 武汉430074
    2 远光软件股份有限公司远光研究院 广东 珠海519085
    3 武汉科技大学机械自动化学院 武汉430081
  • 收稿日期:2023-02-27 修回日期:2023-05-15 出版日期:2024-04-15 发布日期:2024-04-10
  • 通讯作者: 王瑞平(ruiping.wang1986@ieee.com)
  • 基金资助:
    国家自然科学基金(51975432)

Review of Vision-based Neural Network 3D Dynamic Gesture Recognition Methods

WANG Ruiping1,2, WU Shihong2, ZHANG Meihang3, WANG Xiaoping1   

  1. 1 School of Artificial Intelligence and Automation,Huazhong University of Science and Technology,Wuhan 430074,China
    2 Research Institute of Yanguang,YGSOFT INC.,Zhuhai,Guangdong 519085,China
    3 School of Mechanical Automation,Wuhan University of Science and Technology,Wuhan 430081,China
  • Received:2023-02-27 Revised:2023-05-15 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    National Natural Science Foundation of China(51975432).

摘要: 动态手势识别作为一种重要的人机交互手段而受到广泛关注,其中基于视觉的识别方式因其使用便利性和低成本的优势成为新一代人机交互的首选技术。以人工神经网络为中心,综述了基于视觉的手势识别方法研究进展,分析了不同类型人工神经网络在手势识别中的发展现状,调研并归纳总结了待识别数据和训练数据集的类型及特点;此外,通过开展性能对比实验,客观评估了不同类型的人工神经网络,并对结果进行了分析。最后,对调研内容进行了总结,对该领域面临的挑战和存在的问题进行了阐述,对动态手势识别技术的发展趋势进行了展望。

关键词: 动态手势识别, 人机交互, 人工神经网络, 卷积神经网络, 循环神经网络, 注意力机制, 混合神经网络

Abstract: Dynamic gesture recognition,as an important means of human-computer interaction,has received widespread attention.Among them,the visual-based recognition method has become the preferred choice for the new generation of human-computer interaction due to its convenience and low cost.Centered on artificial neural networks,this paper reviews the research progress of visual-based gesture recognition methods,analyzes the development status of different types of artificial neural networks in gesture recognition,investigates and summarizes the types and characteristics of data to be recognized and training datasets.In addition,through performance comparison experiments,different types of artificial neural networks are objectively evaluated,and the results are analyzed.Finally,based on the summary of the research content,the challenges and problems faced in this field are elaborated,and the development trend of dynamic gesture recognition technology is prospected.

Key words: Dynamic gesture recognition, Human-Computer interaction, Artificial neural networks, Convolutional neural network, Recurrent neural network, Attention mechanism, Hybrid neural network

中图分类号: 

  • TP391
[1]CHAKRABORTY B K,SARMA D,BHUYAN M K,et al.Review of constraints on vision-based gesture recognition for human-computer interaction[J].IET Computer Vision,2018,12(1):3-15.
[2]TAN C,SUN Y,LI G,et al.Research on gesture recognition of smart data fusion features in the IoT[J].Neural Computing and Applications,2020,32(22):16917-16929.
[3]HU B,WANG J.Deep learning based hand gesture recognition and UAV flight controls[J].International Journal of Automation and Computing,2020,17(1):17-29.
[4]AOKI S,LIN CW,RAJKUMAR R.Human-robot cooperation for autonomous vehicles and human drivers:Challenges and solutions[J].IEEE Communications Magazine,2021,59(8):35-41.
[5]LUO B,SUN Y,LI G,et al.Decomposition algorithm for depth image of human health posture based on brain health[J].Neural Computing and Applications,2020,32(10):6327-6342.
[6]VAN AMSTERDAM B,CLARKSON M J,STOYANOV D.Gesture recognition in robotic surgery:a review[J].IEEE Transactions on Biomedical Engineering,2021,68(6):2021-2035.
[7]YUAN G,BING R,LIU X,et al.Spatial-Temporal Graph Neural Network based Hand Gesture Recognition[J].ACTA ELECTONICA SINICA,2022,50(4):921-931.
[8]SHU W,CAI K,XIONG N N.Research on strong agile re-sponse task scheduling optimization enhancement with optimal resource usage in green cloud computing[J].Future Generation Computer Systems,2021,124:12-20.
[9]GUPTA H P,CHUDGAR H S,MUKHERJEE S,et al.A continuous hand gestures recognition technique for human-machine interaction using accelerometer and gyroscope sensors[J].IEEE Sensors Journal,2016,16(16):6425-6432.
[10]GAO Q,CHEN Y,JU Z,et al.Dynamic hand gesture recognition based on 3D hand pose estimation for human-robot interaction[J].IEEE Sensors Journal,2021,22(18):17421-17430.
[11]XIE Y G,WANG Q.Summary of Dynamic Gesture Recognition Based on Vision[J].Computer Engineering and Applications,2021,57(22):68-77.
[12]CHEN T T,YAO H,ZUO M Z,et al.Review of Dynamic Gesture Recognition Based on Depth Information[J].Computer Science,2018,45(12):42-51.
[13]YI J G,CHENG J H,KU X S.Review of Gestures Recognition Based on Vision[J].Computer Science,2016,43(Z6):103-108.
[14]TIAN Q H,YANG H M,LIANG Q L,et al.Overview on vision-based dynamic gesture recognition[J].Journal of Zhejiang Institute of Science and Technology,2020,43(4):557-569.
[15]BANDINI A,ZARIFFA J.Analysis of the hands in egocentric vision:A survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,45(6):6846-6866.
[16]WANG P,LI W,OGUNBONA P,et al.RGB-D-based humanmotion recognition with deep learning:A survey[J].Computer Vision and Image Understanding,2018,171:118-139.
[17]CHENG H,YANG L,LIU Z.Survey on 3D hand gesture recognition[J].IEEE Transactions on Circuits and Systems for Video Technology,2015,26(9):1659-1673.
[18]RAUTARAY S S,AGRAWAL A.Vision based hand gesture recognition for human computer interaction:a survey[J].Artificial Intelligence Review,2015,43(1):1-54.
[19]TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotem-poral features with 3d convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4489-4497.
[20]KOPUKLU O,KOSE N,RIGOLL G.Motion fused frames:Data level fusion strategy for hand gesture recognition[C]// Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2018:2103-2111.
[21]HU Z,HU Y,LIU J,et al.3D separable convolutional neural network for dynamic hand gesture recognition[J].Neurocomputing,2018,318:151-161.
[22]DEVINEAU G,MOUTARDE F,XI W,et al.Deep learning for hand gesture recognition on skeletal data[C]// 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition(FG 2018).2018:106-113.
[23]LI Y,HE Z,YE X,et al.Spatial temporal graph convolutional networks for skeleton-based dynamic hand gesture recognition[J].EURASIP Journal on Image and Video Processing,2019,2019(1):1-7.
[24]ABAVISANI M,JOZE H R V,PATEL V M.Improving theperformance of unimodal dynamic hand-gesture recognition with multimodal training[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:1165-1174.
[25]OZCAN T,BASTURK A.Transfer learning-based convolu-tional neural networks with heuristic optimization for hand gesture recognition[J].Neural Computing and Applications,2019,31(12):8955-8970.
[26]KOPUKLU O,KOSE N,GUNDUZ A,et al.Resource efficient 3d convolutional neural networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.2019.
[27]KÖPÜKLÜ O,GUNDUZ A,KOSE N,et al.Real-time hand gesture detection and classification using convolutional neural networks[C]//2019 14th IEEE International Conference on Automatic Face & Gesture Recognition(FG 2019).2019:1-8.
[28]ZHANG E,XUE B,CAO F,et al.Fusion of 2D CNN and 3D DenseNet for dynamic gesture recognition[J].Electronics,2019,8(12):1511.
[29]AL-HAMMADI M,MUHAMMAD G,ABDUL W,et al.Hand gesture recognition for sign language using 3DCNN[J].IEEE Access,2020,8:79491-79509.
[30]AL-HAMMADI M,MUHAMMAD G,ABDUL W,et al.Deep learning-based approach for sign language gesture recognition with efficient hand gesture representation[J].IEEE Access,2020,8:192527-192542.
[31]ZHANG Y,SHI L,WU Y,et al.Gesture recognition based on deep deformable 3D convolutional neural networks[J].Pattern Recognition,2020,107:107416.
[32]YANG S,LIU J,LU S,et al.Collaborative learning of gesture recognition and 3D hand pose estimation with multi-order feature analysis[J].European Conference on Computer Vision,2020:769-786.
[33]SUN Y,WENG Y,LUO B,et al.Gesture recognition algorithm based on multi-scale feature fusion in RGB-D images[J].IET Image Processing,2020,14(15):3662-3668.
[34]LIU J,LIU Y,WANG Y,et al.Decoupled representation lear-ning for skeleton-based gesture recognition[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5751-5760.
[35]LIAO S,LI G,WU H,et al.Occlusion gesture recognition based on improved SSD[J].Concurrency and Computation:Practice and Experience,2021,33(6):e6063.
[36]ALAM M M,ISLAM M T,RAHMAN S M.Unified learning approach for egocentric hand gesture recognition and fingertip detection[J].Pattern Recognition,2022,121:108200.
[37]DU Y,WANG W,WANG L.Hierarchical recurrent neural network for skeleton based action recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1110-1118.
[38]WANG H,WANG L.Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:499-508.
[39]MA C,WANG A,GE C,et al.Hand joints-based gesture recognition for noisy dataset using nested interval unscented Kalman filter with LSTM network[J].The Visual Computer,2018,34:1053-1063.
[40]CHEN X,WANG G,GUO H,et al.Mfa-net:Motion featureaugmented network for dynamic hand gesture recognition from skeletal data[J].Sensors,2019,19(2):239.
[41]YANG L,CHEN J A,ZHU W.Dynamic Hand Gesture Recognition Based on a Leap Motion Controller and Two-Layer Bidirectional Recurrent Neural Network[J].Sensors,2020,20:2106.
[42]MIN Y,ZHANG Y,CHAI X,et al.An efficient pointlstm for point clouds based gesture recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5761-5770.
[43]AMEUR S,KHALIFA A B,BOUHLEL M S.A novel hybrid bidirectional unidirectional LSTM network for dynamic hand gesture recognition with leap motion[J].Entertainment Computing,2020,35:100373.
[44]QI W,OVUR S E,LI Z,et al.Multi-sensor guided hand gesture recognition for a teleoperated robot using a recurrent neural network[J].IEEE Robotics and Automation Letters,2021,6(3):6039-6045.
[45]CHEN Y,ZHAO L,PENG X,et al.Construct dynamic graphs for hand gesture recognition via spatial-temporal attention[J].arXiv:1907.08871,2019.
[46]SHI L,ZHANG Y,CHENG J,et al.Decoupled spatial-temporal attention network for skeleton-based action-gesture recognition[C]//Proceedings of the Asian Conference on Computer Vision.2020.
[47]TSIRONI E,BARROS P,WEBER C,et al.An analysis of con-volutional long short-term memory recurrent neural networks for gesture recognition[J].Neurocomputing,2017,268:76-86.
[48]ZHU G,ZHANG L,SHEN P,et al.Continuous gesture segmentation and recognition using 3DCNN and convolutional LSTM[J].IEEE Transactions on Multimedia,2018,21(4):1011-1021.
[49]WENG J,LIU M,JIANG X,et al.Deformable pose traversal convolution for 3d action and gesture recognition[C]// Procee-dings of the European Conference on Computer Vision(ECCV).2018:136-152.
[50]NUNEZ J C,CABIDO R,PANTRIGO J J,et al.Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition[J].Pattern Recognition,2018,76:80-94.
[51]LAI K,YANUSHKEVICH S N.CNN+ RNN depth and skeleton based dynamic hand gesture recognition[C]//2018 24th International Conference on Pattern Recognition(ICPR).2018:3451-3456.
[52]HOU J,WANG G,CHEN X,et al.Spatial-temporal attention res-TCN for skeleton-based dynamic hand gesture recognition[C]// Proceedings of the European Conference on Computer Vision(ECCV) Workshops.2018.
[53]PIGOU L,VAN DEN OORD A,DIELEMAN S,et al.Beyond temporal pooling:Recurrence and temporal convolutions for gesture recognition in video[J].International Journal of Computer Vision,2018,126(2):430-439.
[54]HAKIM N L,SHIH T K,KASTHURI ARACHCHI S P,et al.Dynamic hand gesture recognition using 3DCNN and LSTM with FSM context-aware model[J].Sensors,2019,19(24):5429.
[55]XING Y,DI CATERINA G,SORAGHAN J.A new spikingconvolutional recurrent neural network(SCRNN) with applications to event-based hand gesture recognition[J].Frontiers in Neuroscience,2020,14:1143.
[56]ELBOUSHAKI A,HANNANE R,AFDEL K,et al.MultiD-CNN:A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences[J].Expert Systems with Applications,2020,139:112829.
[57]ZHANG W,WANG J,LAN F.Dynamic hand gesture recognition based on short-term sampling neural networks[J].IEEE/CAA Journal of Automatica Sinica,2020,8(1):110-120.
[58]KINGKAN C,OWOYEMI J,HASHIMOTO K.Point attention network for gesture recognition using point cloud data[C]// 29th British Machine Vision Conference.BMVC 2018,2019.
[59]NARAYANA P,BEVERIDGE R,DRAPER B A.Gesture re-cognition:Focus on the hands[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5235-5244.
[60]D’EUSANIO A,SIMONI A,PINI S,et al.A transformer-based network for dynamic hand gesture recognition[C]// 2020 International Conference on 3D Vision(3DV).2020:623-632.
[61]LIU Y,JIANG D,DUAN H,et al.Dynamic gesture recognition algorithm based on 3D convolutional neural network[J/OL].Computational Intelligence and Neuroscience,2021.https://pubmed.ncbi.nlm.nih.gov/37416597/.
[62]DOS SANTOS C C,SAMATELO J L A,VASSALLO R F.Dynamic gesture recognition by using CNNs and star RGB:A temporal information condensation[J].Neurocomputing,2020,400:238-254.
[63]LIU J,WANG G,HU P,et al.Global context-aware attentionlstm networks for 3d action recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1647-1656.
[64]CAO C,ZHANG Y,WU Y,et al.Egocentric gesture recognition using recurrent 3d convolutional neural networks with spatiotemporal transformer modules[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:3763-3771.
[65]ZHANG L,ZHU G,MEI L,et al.Attention in convolutional LSTM for gesture recognition[J].Advances in Neural Information Processing Systems,2018,31:1957-1966.
[66]ZHU G,ZHANG L,YANG L,et al.Redundancy and attention in convolutional LSTM for gesture recognition[J].IEEE Tran-sactions on Neural Networks and Learning Systems,2019,31(4):1323-1335.
[67]NGUYEN X S,BRUN L,LÉZORAY O,et al.A neural network based on SPD manifold learning for skeleton-based hand gesture recognition[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:12036-12045.
[68]LIU F,ZENG W,YUAN C,et al.Kinect-based hand gesture recognition using trajectory information,hand motion dynamics and neural networks[J].Artificial Intelligence Review,2019,52(1):563-583.
[69]LI H,WU L,WANG H,et al.Hand gesture recognition enhancement based on spatial fuzzy matching in leap motion[J].IEEE Transactions on Industrial Informatics,2019,16(3):1885-1894.
[70]BENITEZ-GARCIA G,OLIVARES-MERCADO J,SANCHEZ-PEREZ G,et al.IPN hand:A video dataset and benchmark for real-time continuous hand gesture recognition[C]//2020 25th International Conference on Pattern Recognition(ICPR).2021:4340-4347.
[71]MATERZYNSKA J,BERGER G,BAX I,et al.The jester dataset:A large-scale video dataset of human gestures[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.2019.
[72]WAN J,ZHAO Y,ZHOU S,et al.Chalearn looking at people rgb-d isolated and continuous datasets for gesture recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops.2016:56-64.
[73]MOLCHANOV P,YANG X,GUPTA S,et al.Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4207-4215.
[74]DE SMEDT Q,WANNOUS H,VANDEBORRE J P.Skeleton-based dynamic hand gesture recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2016:1-9.
[75]DE SMEDT Q,WANNOUS H,VANDEBORRE J P,et al.Shrec’17 track:3d hand gesture recognition using a depth and skeletal dataset[C]//3DOR-10th Eurographics Workshop on 3D Object Retrieval.2017:1-6.
[76]ZHANG Y,CAO C,CHENG J,et al.EgoGesture:A new dataset and benchmark for egocentric hand gesture recognition[J].IEEE Transactions on Multimedia,2018,20(5):1038-1050.
[77]CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308.
[78]SOOMRO K,ZAMIR A R,SHAH M.UCF101:A dataset of101 human actions classes from videos in the wild[J].arXiv:1212.0402,2012.
[79]CHEN C,JAFARI R,KEHTARNAVAZ N.UTD-MHAD:Amultimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor[C]//2015 IEEE International Conference on Image Processing(ICIP).2015:168-172.
[80]SHANABLEH T,ASSALEH K,AL-ROUSAN M.Spatio-temporal feature-extraction techniques for isolated gesture recognition in Arabic sign language[C]// IEEE Transactions on Systems,Man,and Cybernetics,Part B(Cybernetics).2007:641-650.
[81]WILBUR R,KAK A C.Purdue RVL-SLLL American sign language database[EB/OL].https://engineering.purdue.edu/RVL/Database/ASL/asl-database-front.htm.
[82]GARCIA-HERNANDO G,YUAN S,BAEK S,et al.First-person hand action benchmark with rgb-d videos and 3d hand pose annotations[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:409-419.
[83]AVOLA D,BERNARDI M,CINQUE L,et al.Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures[C]//IEEE Transactions on Multimedia.2018:234-245.
[84]WU W,LI C,CHENG Z,et al.Yolse:Egocentric fingertip detection from single rgb images[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2017:623-630.
[85]ESCALERA S,BARÓ X,GONZALEZ J,et al.Chalearn lookingat people challenge 2014:Dataset and results[C]//European Conference on Computer Vision.2014:459-473.
[86]TSIRONI E,BARROS P V,WERMTER S.Gesture Recognition with a Convolutional Long Short-Term Memory Recurrent Neural Network[C]//ESANN.2016.
[87]LI W,ZHANG Z,LIU Z.Action recognition based on a bag of 3d points[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-workshops.2010:9-14.
[88]OFLI F,CHAUDHRY R,KURILLO G,et al.Berkeley mhad:A comprehensive multimodal human action database[C]//2013 IEEE Workshop on Applications of Computer Vision(WACV).2013:53-60.
[89]MÜLLER M,RÖDER T,CLAUSEN M,et al.Documentation mocap database HDM05[R].Bonn,Universität Bonn,Compu-ter Graphics Technicl Reports,2007.
[90]SHAHROUDY A,LIU J,NG TT,et al.Ntu rgb+ d:A large scale dataset for 3d human activity analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1010-1019.
[91]YUN K,HONORIO J,CHATTOPADHYAY D,et al.Two-person interaction detection using body-pose features and multiple instance learning[C]// 2012 IEEE Computer Society Con-ference on Computer Vision and Pattern Recognition Workshops.2012:28-35.
[92]AMEUR S,KHALIFA A B,BOUHLEL M S.A comprehensive leap motion database for hand gesture recognition[C]//2016 7th International Conference on Sciences of Electronics,Technologies of Information and Telecommunications(SETIT).2016:514-519.
[93]MCCARTNEY R,YUAN J,BISCHOF H P.Gesture recognition with the leap motion controller[C]//International Confe-rence on Image Processing,Computer Vision & Pattern Recognition.2015.
[94]AMIR A,TABA B,BERG D,et al.A low power,fully event-based gesture recognition system[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7243-7252.
[95]LIU L,SHAO L.Learning discriminative representations from RGB-D video data[C]//Twenty-third International Joint Con-ference on Artificial Intelligence.2013.
[96]SONG Y,DEMIRDJIAN D,DAVIS R.Tracking body and hands for gesture recognition:Natops aircraft handling signals database[C]//2011 IEEE International Conference on Automatic Face & Gesture Recognition(FG).2011:500-506.
[97]MANGANARO F,PINI S,BORGHI G,et al.Hand gestures for the human-car interaction:The briareo dataset[C]//International Conference on Image Analysis and Processing.2019:560-571.
[98]WU Z,WANG X,JIANG YG,et al.Modeling spatial-temporal clues in a hybrid deep learning framework for video classification[C]//Proceedings of the 23rd ACM International Conference on Multimedia.2015:461-470.
[99]KAPITANOV A,MAKHLYARCHUK A,KVANCHIANI K.HaGRID-HAnd Gesture Recognition Image Dataset[J].arXiv:2206.08219,2022.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!