Computer Science ›› 2025, Vol. 52 ›› Issue (6A): 240400098-9.doi: 10.11896/jsjkx.240400098

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Continuous Sign Language Recognition Based on Graph Convolutional Network and CTC/Attention

BIAN Hui1, MENG Changqian2,3, LI Zihan2,3, CHEN Zihao2,3and XIE Xuelei2,3   

  1. 1 College of Mechanical Engineering,Qinhuangdao,Hebei 066000,China
    2 Hebei Provincial Key Laboratory of Parallel Robot and Mechatronic System,Yanshan University,Qinhuangdao,Hebei 066000,China
    3 Laboratory of Advanced Forging & Stamping Technology and Science,Ministry of Education,Yanshan University,Qinhuangdao,Hebei 066000,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:BIAN Hui,born in 1982,Ph.D,associate professor.His main research interests include parallel robots,rehabilitation robots,ship detection robots,etc.
    MENG Changqian,born in 2000,postgraduate.His main research interests include machine vision,rehabilitation robot and mechanical automation.
  • Supported by:
    National Natural Science Foundation of China(51305380) and Natural Science Foundation of Hebei Province(E2015203144).

Abstract: Sign language is an important means of communication among people with hearing impairment.Through sign language recognition,patients can communicate with normal people without barriers.With the development of deep learning technology,various sign language recognition technologies have also developed,but the existing sign language recognition technologies often cannot complete the task of continuous sign language recognition.Therefore,this paper proposes a continuous sign language re-cognition method based on graph convolution network(GCN) and connectionist temporal classification of neural network classification/attention( CTC/Attention),which extracts features from the space dimension and time dimension,respectively.The mechanism of spatial attention is blended in among them,assigning weight given to bone point,thereby highlight the effective spatial characteristics and to realize continuous sign language recognition.This method can realize sequence alignment and contextual semantic modeling of continuous sign language sentence translation.Firstly,data of sign language action bone points are collected based on MediaPipe framework,and a dataset of skeletal key point in Chinese sign language is built based on this.A dynamic chiral word recognition method based on Spatio-Temporal graph convolutional network(ST-GCN) is designed.Finally,a method based on GCN and CTC/Attention code network is proposed to realize continuous sign language sentence recognition.In the case of limited datasets,the proposed method is evaluated on the self-built skeletal point dataset SSLD,the experimental results show that,the average continuous sign language recognition accuracy reaches 94.41%,and the model has been proved to have good sign language recognition ability.

Key words: Continuous sign language recognition, Graph convolutional network, Temporal class classification based on neural network, MediaPipe frame, Skeletal key point, Spatio-temporal graph based neural network

CLC Number: 

  • U671.99
[1]SHANGHAI ORIENTAL INTERNATIONAL SIGN LANGUAGE EDUCATION SCHOOL.Introductionto Chinese sign language[M]//Shanghai:ShanghaiPeople’s Publishing House.2007:133-139.
[2]GRIMES G J.Digital Data Entry Glove Interface Device[P].US,4414537,1983,109(3):305-334.
[3]MENG J,YANG P C,YANG C,et al.Design of Natural Gesture Interaction System for Phantom Imaging Device Based on Mediapipe[J].Overseas Electronic Measurement Technology,2023,42(3):116-122.
[4]GUO L,ZHANG T S,SUN W Z,et al.An image semantic description algorithm incorporating spatial attention mechanisms[J].Advances in Lasers and Optoelectronics,2021,58(12):10.
[5]GUO D,TANG S G,LIU X L,et al.A graph convolution based multimodal fusion sign language recognition system and method:CN202010049714.7[P].CN111259804B[2023-12-26].
[6]THACKER N A,CLARK A F,BARRON J L,et al.Perfor-mance characterization in computer vision:A guide to best practices[J]//Computer Vision and Image Understanding,2008.
[7]NAGARAJAN S,SUBASHINI T S.Static Hand Gesture.Re-cognition for Sign Language Alphabets Using Edge Oriented Histogram and Multi Class SVM[J].International Journal of Computer Applications,2013,82(4):28-35.
[8]FLORES C J L,CUTIPA A E G,ENCISO R L.Application of Convolutional Neural Networks for Static Hand Gestures Recognition under Different Invariant Features[C]//2017 IEEE XXIV International Conference on Electronics,Electrical Engineering and Computing(INTERCON).IEEE,2017:1-4.
[9]RODRÍGUEZ M I,MARTÍNEZ O J M,GOIENETXEA I,et al.ANewApproachfor Video Action Recognition:Csp-Based Filtering for Video to Image Transformation[J].IEEE Access,2021,9:139946-139957.
[10]DING S Y,FAN Y B,CHEN N.Human bone point detection based on UNet structure[J].Guangdong communication Technology,2018,38(11):64-69.
[11]ABDUL W,ALSULAIMAN M,AMIN S U,et al.Intelligent real-time Arabic sign language classification using attention-based inception and BiLSTM[J].Computers & Electrical Enginee-ring,2021,95(6):107395.
[12]ZHOU L Y,ZHANG J H,YUAN T T,et al.Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion[J].Computer Science,2022,49(9):7.
[13]LIU T L,WANG Y Z,BAO B K,et al.A method and system for sign language recognition based on convolutional neural network with two-stream spatio-temporal map:CN202010069598.5[P].CN111325099A[2023-12-26].
[14]GHAEINI R,HASAN S A,DATLA V,et al.DR-BiLSTM:Dependent Reading Bidirectional LSTM for Natural Language Inference[J].2019
[15]LIN M,INOUE N,SHINODA K.Action Sequence Recognition in Videos by Combining a CTC Networkwith a Statistical Language Model[J].Pattern Recognition and Media Understan-ding,2017(362):117.
[16]STANKOVIC L,MANDIC D.Understanding the Basis ofGraph Convolutional Neural Networks via an Intuitive Matched Filtering Approach[J].2021.
[17]LU S,Research on sign language recognition method based on modal fusion[D].Xuzhou:China University of Mining and Technology Engineering,2021:44-65.
[18]YAN S Y,XUE W L,YUAN T T.An Overview of Sign Language Recognition and Interpretation[J].Computer Science and Exploration,2022,16(11):15.
[19]YAN S,XIONG Y,LIN D.Spatial Temporal Graph Convolu-tional Networks for Skeleton-Based Action Recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018,32(1):7444-7452.
[20]DONG J.Study on Semantic Segmentation of Remote Sensing Images Based on Codec Convolutional Neural Networks [D].Anhui:Hefei University of Technology,2022.
[21]ZHANG C W,ZHAO H T,ZHANG M T,et al.A lip recognition method based on generative adversarial network and temporal convolutional network:CN202110262815.7[P].CN112818950A[2023-12-26].
[22]MARFIL R.ATTENTION MECHANISM[J].Grupoisis.uma.es[2023-09-20].
[23]SHI B,BAI X,YAO C.An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(11):2298-2304.
[24]SUTSKEVER I,VINYALS O,LE Q V.Sequence to Sequence Learning with Neural Networks[J].arXiv:1409.3215,2014.
[1] TAN Qiyin, YU Jiong, CHEN Zixin. Outlier Detection Method Based on Adaptive Graph Autoencoder [J]. Computer Science, 2025, 52(6): 129-138.
[2] ZHANG Jiaxiang, PAN Min, ZHANG Rui. Study on EEG Emotion Recognition Method Based on Self-supervised Graph Network [J]. Computer Science, 2025, 52(5): 122-127.
[3] HUANG Qian, SU Xinkai, LI Chang, WU Yirui. Hypergraph Convolutional Network with Multi-perspective Topology Refinement forSkeleton-based Action Recognition [J]. Computer Science, 2025, 52(5): 220-226.
[4] ZHANG Lu, DUAN Youxiang, LIU Juan, LU Yuxi. Chinese Geological Entity Relation Extraction Based on RoBERTa and Weighted Graph Convolutional Networks [J]. Computer Science, 2024, 51(8): 297-303.
[5] YUAN Lining, FENG Wengang, LIU Zhao. Multi-channel Graph Convolutional Networks Enhanced by Label Propagation Algorithm [J]. Computer Science, 2024, 51(8): 304-312.
[6] ZHANG Xiaoxi, LI Dongxi. Cancer Subtype Prediction Based on Similar Network Fusion Algorithm [J]. Computer Science, 2024, 51(6A): 230500006-7.
[7] HOU Lei, LIU Jinhuan, YU Xu, DU Junwei. Review of Graph Neural Networks [J]. Computer Science, 2024, 51(6): 282-298.
[8] LI Yilin, SUN Chengsheng, LUO Lin, JU Shenggen. Aspect-based Sentiment Classification for Word Information Enhancement Based on Sentence Information [J]. Computer Science, 2024, 51(6): 299-308.
[9] YUAN Rong, PENG Lilan, LI Tianrui, LI Chongshou. Traffic Flow Prediction Model Based on Dual Prior-adaptive Graph Neural ODE Network [J]. Computer Science, 2024, 51(4): 151-157.
[10] ZHANG Mingdao, ZHOU Xin, WU Xiaohong, QING Linbo, HE Xiaohai. Unified Fake News Detection Based on Semantic Expansion and HDGCN [J]. Computer Science, 2024, 51(4): 299-306.
[11] YUAN Jing, XIA Ying. Vehicle Trajectory Prediction Based on Spatial-Temporal Graph Attention Convolutional Network [J]. Computer Science, 2024, 51(12): 157-165.
[12] YANG Yufan, YUAN Liming, WANG Ke, LI Hongyi, LI Yixuan, YAO Yujia, WANG Jingyi. Grading Model for Diabetic Retinopathy Based on Graph Convolutional Network [J]. Computer Science, 2024, 51(11A): 231000042-5.
[13] PENG Guangchuan, WU Fei, HAN Lu, JI Yimu, JING Xiaoyuan. Fake News Detection Based on Cross-modal Interaction and Feature Fusion Network [J]. Computer Science, 2024, 51(11): 23-29.
[14] LI Ke, YANG Ling, ZHAO Yanbo, CHEN Yonglong, LUO Shouxi. EGCN-CeDML:A Distributed Machine Learning Framework for Vehicle Driving Behavior Prediction [J]. Computer Science, 2023, 50(9): 318-330.
[15] DUAN Jianyong, YANG Xiao, WANG Hao, HE Li, LI Xin. Document-level Relation Extraction of Graph Attention Convolutional Network Based onInter-sentence Information [J]. Computer Science, 2023, 50(6A): 220800189-6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!