计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240400098-9.doi: 10.11896/jsjkx.240400098

• 图像处理&多媒体技术 • 上一篇    下一篇

基于图卷积网络和CTC/Attention的连续手语识别

边辉1, 孟畅乾2,3, 李子涵2,3, 陈子豪2,3, 谢雪雷2,3   

  1. 1 中国船级社秦皇岛分社 河北 秦皇岛 066000
    2 燕山大学河北省并联机器人与机电系统实验室 河北 秦皇岛 066000
    3 燕山大学先进锻压成形技术与科学教育部重点实验室 河北 秦皇岛 066000
  • 出版日期:2025-06-16 发布日期:2025-06-12
  • 通讯作者: 孟畅乾(2274115641@qq.com)
  • 作者简介:(2274115641@qq.com)
  • 基金资助:
    国家自然科学基金(51305380);河北省自然科学基金(E2015203144)

Continuous Sign Language Recognition Based on Graph Convolutional Network and CTC/Attention

BIAN Hui1, MENG Changqian2,3, LI Zihan2,3, CHEN Zihao2,3and XIE Xuelei2,3   

  1. 1 College of Mechanical Engineering,Qinhuangdao,Hebei 066000,China
    2 Hebei Provincial Key Laboratory of Parallel Robot and Mechatronic System,Yanshan University,Qinhuangdao,Hebei 066000,China
    3 Laboratory of Advanced Forging & Stamping Technology and Science,Ministry of Education,Yanshan University,Qinhuangdao,Hebei 066000,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:BIAN Hui,born in 1982,Ph.D,associate professor.His main research interests include parallel robots,rehabilitation robots,ship detection robots,etc.
    MENG Changqian,born in 2000,postgraduate.His main research interests include machine vision,rehabilitation robot and mechanical automation.
  • Supported by:
    National Natural Science Foundation of China(51305380) and Natural Science Foundation of Hebei Province(E2015203144).

摘要: 手语是听力障碍患者之间一种重要的交流方式。通过手语识别,可以让患者与正常人进行无障碍的交流。随着深度学习技术的发展,各种手语识别技术也随之发展,但现有的手语识别技术往往无法完成连续识别手语的任务,因此文中提出了一种基于图卷积网络(Graph Convolution Network,GCN)和神经网络的时序类分类(Connectionist Temporal Classification/Attention,CTC/Attention)的连续手语识别方法,分别从空间维度与时间维度提取特征,并将空间注意力机制融入其中,以赋予骨骼点权重,突出有效的空间特征,实现手语的连续识别。该方法可实现连续手语语句翻译的序列对齐和上下文语义建模。首先基于MediaPipe框架采集手语动作骨骼点数据,并基于此搭建中文手语骨骼关键点坐标的数据集,根据骨骼关键点坐标,设计了基于时空图神经网络(Spatio-Temporal Graph Convolutional Networks,ST-GCN)的动态手语词识别方法,然后提出基于GCN和CTC/Attention的编解码器网络,用于实现连续手语语句识别的方法。在数据集有限的情况下,在自建的骨骼点数据集SSLD上对所提出的方法进行评估,实验结果表明,平均连续手语识别字准确率达到94.41%,证明所提模型具有良好的手语识别能力。

关键词: 连续手语识别, 图卷积网络, 基于神经网络的时序类分类, MediaPipe框架, 骨骼关键点, 基于时空图神经网络

Abstract: Sign language is an important means of communication among people with hearing impairment.Through sign language recognition,patients can communicate with normal people without barriers.With the development of deep learning technology,various sign language recognition technologies have also developed,but the existing sign language recognition technologies often cannot complete the task of continuous sign language recognition.Therefore,this paper proposes a continuous sign language re-cognition method based on graph convolution network(GCN) and connectionist temporal classification of neural network classification/attention( CTC/Attention),which extracts features from the space dimension and time dimension,respectively.The mechanism of spatial attention is blended in among them,assigning weight given to bone point,thereby highlight the effective spatial characteristics and to realize continuous sign language recognition.This method can realize sequence alignment and contextual semantic modeling of continuous sign language sentence translation.Firstly,data of sign language action bone points are collected based on MediaPipe framework,and a dataset of skeletal key point in Chinese sign language is built based on this.A dynamic chiral word recognition method based on Spatio-Temporal graph convolutional network(ST-GCN) is designed.Finally,a method based on GCN and CTC/Attention code network is proposed to realize continuous sign language sentence recognition.In the case of limited datasets,the proposed method is evaluated on the self-built skeletal point dataset SSLD,the experimental results show that,the average continuous sign language recognition accuracy reaches 94.41%,and the model has been proved to have good sign language recognition ability.

Key words: Continuous sign language recognition, Graph convolutional network, Temporal class classification based on neural network, MediaPipe frame, Skeletal key point, Spatio-temporal graph based neural network

中图分类号: 

  • U671.99
[1]SHANGHAI ORIENTAL INTERNATIONAL SIGN LANGUAGE EDUCATION SCHOOL.Introductionto Chinese sign language[M]//Shanghai:ShanghaiPeople’s Publishing House.2007:133-139.
[2]GRIMES G J.Digital Data Entry Glove Interface Device[P].US,4414537,1983,109(3):305-334.
[3]MENG J,YANG P C,YANG C,et al.Design of Natural Gesture Interaction System for Phantom Imaging Device Based on Mediapipe[J].Overseas Electronic Measurement Technology,2023,42(3):116-122.
[4]GUO L,ZHANG T S,SUN W Z,et al.An image semantic description algorithm incorporating spatial attention mechanisms[J].Advances in Lasers and Optoelectronics,2021,58(12):10.
[5]GUO D,TANG S G,LIU X L,et al.A graph convolution based multimodal fusion sign language recognition system and method:CN202010049714.7[P].CN111259804B[2023-12-26].
[6]THACKER N A,CLARK A F,BARRON J L,et al.Perfor-mance characterization in computer vision:A guide to best practices[J]//Computer Vision and Image Understanding,2008.
[7]NAGARAJAN S,SUBASHINI T S.Static Hand Gesture.Re-cognition for Sign Language Alphabets Using Edge Oriented Histogram and Multi Class SVM[J].International Journal of Computer Applications,2013,82(4):28-35.
[8]FLORES C J L,CUTIPA A E G,ENCISO R L.Application of Convolutional Neural Networks for Static Hand Gestures Recognition under Different Invariant Features[C]//2017 IEEE XXIV International Conference on Electronics,Electrical Engineering and Computing(INTERCON).IEEE,2017:1-4.
[9]RODRÍGUEZ M I,MARTÍNEZ O J M,GOIENETXEA I,et al.ANewApproachfor Video Action Recognition:Csp-Based Filtering for Video to Image Transformation[J].IEEE Access,2021,9:139946-139957.
[10]DING S Y,FAN Y B,CHEN N.Human bone point detection based on UNet structure[J].Guangdong communication Technology,2018,38(11):64-69.
[11]ABDUL W,ALSULAIMAN M,AMIN S U,et al.Intelligent real-time Arabic sign language classification using attention-based inception and BiLSTM[J].Computers & Electrical Enginee-ring,2021,95(6):107395.
[12]ZHOU L Y,ZHANG J H,YUAN T T,et al.Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion[J].Computer Science,2022,49(9):7.
[13]LIU T L,WANG Y Z,BAO B K,et al.A method and system for sign language recognition based on convolutional neural network with two-stream spatio-temporal map:CN202010069598.5[P].CN111325099A[2023-12-26].
[14]GHAEINI R,HASAN S A,DATLA V,et al.DR-BiLSTM:Dependent Reading Bidirectional LSTM for Natural Language Inference[J].2019
[15]LIN M,INOUE N,SHINODA K.Action Sequence Recognition in Videos by Combining a CTC Networkwith a Statistical Language Model[J].Pattern Recognition and Media Understan-ding,2017(362):117.
[16]STANKOVIC L,MANDIC D.Understanding the Basis ofGraph Convolutional Neural Networks via an Intuitive Matched Filtering Approach[J].2021.
[17]LU S,Research on sign language recognition method based on modal fusion[D].Xuzhou:China University of Mining and Technology Engineering,2021:44-65.
[18]YAN S Y,XUE W L,YUAN T T.An Overview of Sign Language Recognition and Interpretation[J].Computer Science and Exploration,2022,16(11):15.
[19]YAN S,XIONG Y,LIN D.Spatial Temporal Graph Convolu-tional Networks for Skeleton-Based Action Recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018,32(1):7444-7452.
[20]DONG J.Study on Semantic Segmentation of Remote Sensing Images Based on Codec Convolutional Neural Networks [D].Anhui:Hefei University of Technology,2022.
[21]ZHANG C W,ZHAO H T,ZHANG M T,et al.A lip recognition method based on generative adversarial network and temporal convolutional network:CN202110262815.7[P].CN112818950A[2023-12-26].
[22]MARFIL R.ATTENTION MECHANISM[J].Grupoisis.uma.es[2023-09-20].
[23]SHI B,BAI X,YAO C.An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(11):2298-2304.
[24]SUTSKEVER I,VINYALS O,LE Q V.Sequence to Sequence Learning with Neural Networks[J].arXiv:1409.3215,2014.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!