计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240400098-9.doi: 10.11896/jsjkx.240400098
边辉1, 孟畅乾2,3, 李子涵2,3, 陈子豪2,3, 谢雪雷2,3
BIAN Hui1, MENG Changqian2,3, LI Zihan2,3, CHEN Zihao2,3and XIE Xuelei2,3
摘要: 手语是听力障碍患者之间一种重要的交流方式。通过手语识别,可以让患者与正常人进行无障碍的交流。随着深度学习技术的发展,各种手语识别技术也随之发展,但现有的手语识别技术往往无法完成连续识别手语的任务,因此文中提出了一种基于图卷积网络(Graph Convolution Network,GCN)和神经网络的时序类分类(Connectionist Temporal Classification/Attention,CTC/Attention)的连续手语识别方法,分别从空间维度与时间维度提取特征,并将空间注意力机制融入其中,以赋予骨骼点权重,突出有效的空间特征,实现手语的连续识别。该方法可实现连续手语语句翻译的序列对齐和上下文语义建模。首先基于MediaPipe框架采集手语动作骨骼点数据,并基于此搭建中文手语骨骼关键点坐标的数据集,根据骨骼关键点坐标,设计了基于时空图神经网络(Spatio-Temporal Graph Convolutional Networks,ST-GCN)的动态手语词识别方法,然后提出基于GCN和CTC/Attention的编解码器网络,用于实现连续手语语句识别的方法。在数据集有限的情况下,在自建的骨骼点数据集SSLD上对所提出的方法进行评估,实验结果表明,平均连续手语识别字准确率达到94.41%,证明所提模型具有良好的手语识别能力。
中图分类号:
[1]SHANGHAI ORIENTAL INTERNATIONAL SIGN LANGUAGE EDUCATION SCHOOL.Introductionto Chinese sign language[M]//Shanghai:ShanghaiPeople’s Publishing House.2007:133-139. [2]GRIMES G J.Digital Data Entry Glove Interface Device[P].US,4414537,1983,109(3):305-334. [3]MENG J,YANG P C,YANG C,et al.Design of Natural Gesture Interaction System for Phantom Imaging Device Based on Mediapipe[J].Overseas Electronic Measurement Technology,2023,42(3):116-122. [4]GUO L,ZHANG T S,SUN W Z,et al.An image semantic description algorithm incorporating spatial attention mechanisms[J].Advances in Lasers and Optoelectronics,2021,58(12):10. [5]GUO D,TANG S G,LIU X L,et al.A graph convolution based multimodal fusion sign language recognition system and method:CN202010049714.7[P].CN111259804B[2023-12-26]. [6]THACKER N A,CLARK A F,BARRON J L,et al.Perfor-mance characterization in computer vision:A guide to best practices[J]//Computer Vision and Image Understanding,2008. [7]NAGARAJAN S,SUBASHINI T S.Static Hand Gesture.Re-cognition for Sign Language Alphabets Using Edge Oriented Histogram and Multi Class SVM[J].International Journal of Computer Applications,2013,82(4):28-35. [8]FLORES C J L,CUTIPA A E G,ENCISO R L.Application of Convolutional Neural Networks for Static Hand Gestures Recognition under Different Invariant Features[C]//2017 IEEE XXIV International Conference on Electronics,Electrical Engineering and Computing(INTERCON).IEEE,2017:1-4. [9]RODRÍGUEZ M I,MARTÍNEZ O J M,GOIENETXEA I,et al.ANewApproachfor Video Action Recognition:Csp-Based Filtering for Video to Image Transformation[J].IEEE Access,2021,9:139946-139957. [10]DING S Y,FAN Y B,CHEN N.Human bone point detection based on UNet structure[J].Guangdong communication Technology,2018,38(11):64-69. [11]ABDUL W,ALSULAIMAN M,AMIN S U,et al.Intelligent real-time Arabic sign language classification using attention-based inception and BiLSTM[J].Computers & Electrical Enginee-ring,2021,95(6):107395. [12]ZHOU L Y,ZHANG J H,YUAN T T,et al.Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion[J].Computer Science,2022,49(9):7. [13]LIU T L,WANG Y Z,BAO B K,et al.A method and system for sign language recognition based on convolutional neural network with two-stream spatio-temporal map:CN202010069598.5[P].CN111325099A[2023-12-26]. [14]GHAEINI R,HASAN S A,DATLA V,et al.DR-BiLSTM:Dependent Reading Bidirectional LSTM for Natural Language Inference[J].2019 [15]LIN M,INOUE N,SHINODA K.Action Sequence Recognition in Videos by Combining a CTC Networkwith a Statistical Language Model[J].Pattern Recognition and Media Understan-ding,2017(362):117. [16]STANKOVIC L,MANDIC D.Understanding the Basis ofGraph Convolutional Neural Networks via an Intuitive Matched Filtering Approach[J].2021. [17]LU S,Research on sign language recognition method based on modal fusion[D].Xuzhou:China University of Mining and Technology Engineering,2021:44-65. [18]YAN S Y,XUE W L,YUAN T T.An Overview of Sign Language Recognition and Interpretation[J].Computer Science and Exploration,2022,16(11):15. [19]YAN S,XIONG Y,LIN D.Spatial Temporal Graph Convolu-tional Networks for Skeleton-Based Action Recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018,32(1):7444-7452. [20]DONG J.Study on Semantic Segmentation of Remote Sensing Images Based on Codec Convolutional Neural Networks [D].Anhui:Hefei University of Technology,2022. [21]ZHANG C W,ZHAO H T,ZHANG M T,et al.A lip recognition method based on generative adversarial network and temporal convolutional network:CN202110262815.7[P].CN112818950A[2023-12-26]. [22]MARFIL R.ATTENTION MECHANISM[J].Grupoisis.uma.es[2023-09-20]. [23]SHI B,BAI X,YAO C.An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(11):2298-2304. [24]SUTSKEVER I,VINYALS O,LE Q V.Sequence to Sequence Learning with Neural Networks[J].arXiv:1409.3215,2014. |
|