Computer Science ›› 2021, Vol. 48 ›› Issue (3): 60-70.doi: 10.11896/jsjkx.210100227

Special Issue: Advances on Multimedia Technology

• Advances on Multimedia Technology • Previous Articles     Next Articles

Review of Sign Language Recognition, Translation and Generation

GUO Dan, TANG Shen-geng, HONG Ri-chang, WANG Meng   

  1. School of Computer Science and Information Engineering,Hefei University of Technology,Hefei 230601,China
    Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology),Ministry of Education,Hefei 230601,China
    Intelligent Interconnected Systems Laboratory of Anhui Province,Hefei 230601,China
  • Received:2021-01-29 Revised:2021-02-19 Online:2021-03-15 Published:2021-03-05
  • About author:GUO Dan,born in 1983,professor.Her main research interests include machine learning,computer vision and multimedia content analysis.
  • Supported by:
    National Key Research and Development Program of China(2018YFC0830103),National Natural Science Foundation of China(61876058) and Fundamental Research Funds for the Central Universities of Ministry of Education of China(JZ2020HGTB0020).

Abstract: Sign language research is a typical cross-disciplinary research topic,involving computer vision,natural language processing,cross-media computing and human-computer interaction.Sign language research mainly includes isolated sign language recognition,continuous sign language translation and sign language video generation.Sign language recognition and translation aim to convert sign language videos into textual words or sentences,while sign language generation synthesizes sign videos based on spoken or textual sentences.In other words,sign language translation and generation are inverse processes.This paper reviews the latest progress of sign language research,introduces its background and challenges,reviews typical methods and cutting-edge research on sign language recognition,translation and generation tasks.Combining with the problems in the current methods,the future research direction of hand language is prospected.

Key words: Continuous sign language translation, Isolated sign language recognition, Machine translation, Sign language video generation, Video understanding

CLC Number: 

  • TP391.4
[1]WANG H,CHAI X,CHEN X.A Novel Sign Language Recognition Framework Using Hierarchical Grassmann Covariance Matrix[J].IEEE Transactions on Multimedia,2019,21(11):2806-2814.
[2]GUO D,ZHOU W,LI H,et al.Online Early-Late Fusion Based on Adaptive HMM for Sign Language Recognition[J].ACM Transactions on Multimedia Computing Communications and Applications,2018,14(1):1-18.
[3]YIN F,CHAI X,CHEN X.Iterative Reference Driven MetricLearning for Signer Independent Isolated Sign Language Recognition[C]//European Conference on Computer Vision.Sprin-ger,Cham,2016:434-450.
[4]WANG Q,CHEN X L,WANG C L,et al.A Data-Deficiency-Tolerated Method for Viewpoint Independent Sign Language Recognition[J].Chinese Journal of Computers,2009,32(5):953-961.
[5]YUAN T,SAH S,ANANTHANARAYANA T,et al.LargeScale Sign Language Interpretation[C]//IEEE International Conference on Automatic Face & Gesture Recognition.IEEE,2019:1-5.
[6]KUSHWAH M S,SHARMA M,JAIN K,et al.Sign language interpretation using pseudo glove[C]//International Conference on Intelligent Communication,Control and Devices.Singapore:Springer,2017:9-18.
[7]PU J,ZHOU W,LI H.Iterative Alignment Network for Continuous Sign Language Recognition[C]//Computer Vision and Pattern Recognition.2019:4165-4174.
[8]ZHOU M,NG M,CAI Z,et al.Self-Attention-based Fully-In-ception Networks for Continuous Sign Language Recognition[C]//European Conference on Artificial Intelligence.2020:8.
[9]PU J,ZHOU W,LI H.Dilated Convolutional Network with Iterative Optimization for Continuous Sign Language Recognition[C]//International Joint Conference on Artificial Intelligence.2018:885-891.
[10]ZHOU H,ZHOU W,ZHOU Y,et al.Spatial-Temporal Multi-cue Network for Continuous Sign Language Recognition[C]//AAAI Conference on Artificial Intelligence.2020:13009-13016.
[11]CIHAN C N,HADFIELD S,KOLLER O,et al.Neural sign language translation [C]//Computer Vision and Pattern Recognition.2018:7784-7793.
[12]GUO D,TANG S,WANG M.Connectionist Temporal Modeling of Video and Language:A Joint Model for Translation and Sign Labeling[C]//International Joint Conference on Artificial Intelligence.2019:751-757.
[13]GUO D,ZHOU W,LI H,et al.Hierarchical LSTM for SignLanguage Translation[C]//AAAI Conference on Artificial Intelligence.2018:6845-6852.
[14]GUO D,WANG S,TIAN Q,et al.Dense Temporal Convolution Network for Sign Language Translation[C]//International Joint Conference on Artificial Intelligence.2019:744-750.
[15]WANG S,GUO D,ZHOU W,et al.Connectionist Temporal Fusion for Sign Language Translation[C]//ACM International Conference on Multimedia.2018:1483-1491.
[16]SAGAWA H,TAKEUCHI M.A Teaching System of Japanese Sign Language Using Sign Language Recognition and Generation[C]//ACM International Conference on Multimedia.2002:137-145.
[17]XIAO Q,QIN M,YIN Y.Skeleton-Based Chinese Sign Lan-guage Recognition and Generation for Bidirectional Communication between Deaf and Hearing People[J].Neural Networks,2020,125:41-55.
[18]SAUNDERS B,CAMGÖZ N C,BOWDEN R.AdversarialTraining for Multi-Channel Sign Language Production[C]//British Machine Vision Conference.2020:1-15.
[19]STOLL S,CAMGOZ N C,HADFIELD S,et al.Text2Sign:Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks[J].International Journal of Computer Vision,2020:128(4):891-908.
[20]SAUNDERS B,CAMGOZ N C,BOWDEN R.ProgressiveTransformers for End-to-end Sign Language Production[C]//European Conference on Computer Vision.2020:687-705.
[21]STOLL S,HADFIELD S,BOWDEN R.SignSynth:Data-driven Sign Language Video Generation[C]//Assistive Computer Vision and Robotics.2020:353-370.
[22]KARPOUZIS K,CARIDAKIS G,FOTINEA S E,et al.Educational Resources and Implementation of A Greek Sign Language Synthesis Architecture [J].Computers & Education,2007,49(1):54-74.
[23]ZELINKA J,KANIS J.Neural Sign Language Synthesis:Words Are Our Glosses[C]//IEEE Winter Conference on Applications of Computer Vision.2020:3395-3403.
[24]ZELINKA J,KANIS J,SALAJKA P.NN-based Czech SignLanguage Synthesis[C]//International Conference on Speech and Computer.Springer,Cham,2019:559-568.
[25]ZHENG L,LIANG B.Sign Language Recognition Using Depth Images[C]//International Conference on Control,Automation,Robotics and Vision.2016:1-6.
[26]OLIVEIRA M,SUTHERLAND A,FAROUK M.Two-stagePCA with Interpolated Data for Hand Shape Recognition in Sign Language[C]//IEEE Applied Imagery Pattern Recognition Workshop.2016:1-4.
[27]HASSAN M,ASSALEH K,SHANABLEH T.User-dependent Sign Language Recognition Using Motion Detection[C]//International Conference on Computational Science and Computational Intelligence.2016:852-856.
[28]LIN Y,CHAI X,ZHOU Y,et al.Curve Matching from the View of Manifold for Sign Language Recognition[C]//Asian Conference on Computer Vision.2014:233-246.
[29]MIAO Y W,LI J Y,LIU J Z,et al.Hand Gesture Recognition Based on Joint Rotation Feature and Fingertip Distance Feature[J].Chinese Journal of Computers,2020,43(1):78-92.
[30]YIN F,CHAI X,ZHOU Y,et al.Semantics Constrained Dic-tionary Learning for Signer-Independent Sign Language Recognition[C]//IEEE International Conference on Image Processing.2015:3310-3314.
[31]PU J,ZHOU W,LI H.Sign Language Recognition with Multi-modal Features[C]//Pacific Rim Conference on Multimedia.2016:252-261.
[32]LI Y,MIAO Q,TIAN K,et al.Large-scale Gesture Recognition with A Fusion of RGB-D Data Based on The C3D Model[C]//International Conference on Pattern Recognition.2016:25-30.
[33]THANG P Q,THUY N T,LAM H T.The SVM,SimpSVMand RVM on sign language recognition problem[C]//IEEE International Conference on Information Science and Technology.2017:398-403.
[34]HUANG J,ZHOU W,ZHANG Q,et al.Video-based Sign Language Recognition without Temporal Segmentation[C]//AAAI Conference on Artificial Intelligence.2018:2257-2264.
[35]AHMED W,CHANDA K,MITRA S.Vision Based Hand Gesture Recognition Using Dynamic Time Warping for Indian Sign Language[C]//IEEE International Conference on Information Science.2016:120-125.
[36]FANG G L,GAO W,CHEN X L,et al.A Signer-Independent Continuous Sign Language Recognition System Based on SRN/HMM FANG[J].Journal of Software,2002(11):2169-2175.
[37]TORNAY S,RAZAVI M,DOSS M M.Towards Multilingual Sign Language Recognition[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2020:6309-6313.
[38]HUANG J,ZHOU W,LI H,et al.Attention-Based 3D-CNNs for Large-Vocabulary Sign Language Recognition[J].IEEE Transactions on Circuits and Systems for Video Technology,2018,29(9):2822-2832.
[39]SZEGEDY C,LIU W,JIA Y,et al.Going Deeper with Convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[40]PIGOU L,DIELEMAN S,KINDERMANS P J,et al.Sign Language Recognition Using Convolutional Neural Networks[C]//European Conference on Computer Vision.2014:572-578.
[41]HUANG J,ZHOU W,LI H,et al.Sign Language Recognition Using 3D Convolutional Neural Networks[C]//International Conference on Multimedia and Expo.2015:1-6.
[42]LI H,GAO L,HAN R,et al.Key Action and Joint CTC-Attention based Sign Language Recognition[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2020:2348-2352.
[43]NIU Z,MAK B.Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition[C]//European Conference on Computer Vision.2020:172-186.
[44]CAMGOZ N C,KOLLER O,HADFIELD S,et al.Multi-channel transformers for multi-articulatory sign language translation[C]//European Conference on Computer Vision.Springer,Cham,2020:301-319.
[45]CAMGOZ N C,KOLLER O,HADFIELD S,et al.Sign Lan-guage Transformers:Joint End-to-end Sign Language Recognition and Translation[C]//IEEE Conference on Computer Vision and Pattern Recognition.2020:10023-10033.
[46]ZHANG Z,PU J,ZHUANG L,et al.Continuous Sign Language Recognition via Reinforcement Learning[C]//IEEE International Conference on Image Processing.2019:285-289.
[47]DE AMORIM C C,MACÊDO D,ZANCHETTIN C.Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition[C]//International Conference on Artificial Neural Networks.Springer,Cham,2019:646-657.
[48]TUNGA A,NUTHALAPATI S V,WACHS J.Pose-based Sign Language Recognition using GCN and BERT[C]//IEEE Winter Conference on Applications of Computer Vision.2020:31-40.
[49]KOLLER O,ZARGARAN S,NEY H,et al.Deep Sign:EnablingRobust Statistical Continuous Sign Language Recognition Via Hybrid CNN-HMMs[J].International Journal of Computer Vision,2018,126(12):1311-1325.
[50]KOLLER O,ZARGARAN S,NEY H.Re-sign:Re-aligned End-to-end Sequence Modelling with Deep Recurrent CNN-HMMs[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:4297-4305.
[51]SONG P,GUO D,XIN H,et al.Parallel Temporal Encoder for Sign Language Translation[C]//IEEE International Conference on Image Processing.IEEE,2019:1915-1919.
[52]YANG Q,PENG J Y.Chinese Sign Language RecognitionMethod Based on Depth Image Information and SURF-BoW[J].Pattern Recognition and Artificial Intelligence,2014,27(8):741-749.
[53]WU D,PIGOU L,KINDERMANS P J,et al.Deep DynamicNeural Networks for Multimodal Gesture Segmentation and Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(8):1583-1597.
[54]YUAN Q,WAN J,LIN C,et al.Global and Local Spatial-Attention Network for Isolated Gesture Recognition[C]//Chinese Conference on Biometric Recognition.Springer,Cham,2019:84-93.
[55]CUI R,LIU H,ZHANG C.A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training[J].IEEE Transactions on Multimedia,2019,21(7):1880-1891.
[56]CUI R,LIU H,ZHANG C.Recurrent Convolutional NeuralNetworks for Continuous Sign Language Recognition by Staged Optimization[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:7361-7369.
[57]KOLLER O,NEY H,BOWDEN R.Deep hand:How To Train A CNN on 1 Million Hand Images When Your Data Is Continuous and Weakly Labelled[C]//IEEE Conference on Computer Vision and Pattern Recognition.2016:3793-3802.
[58]YIN F,CHAI X J,ZHOU Y,et al.Weakly Supervised Metric Learning towards Signer Adaptation for Sign Language Recognition[C]//British Machine Vision Association.2015:1-12.
[59]HAZRA S,SANTRA A.Short-range radar-based gesture recognition system using 3D CNN with triplet loss[J].IEEE Access,2019,7:125623-125633.
[60]GLAUERT J R W,ELLIOTT R,COX S J,et al.Vanessa–A System for Communication between Deaf and Hearing People[J].Technology and Disability,2006,18(4):207-216.
[61]WANG Z Q,GAO W.A Method to Synthesize Chinese SignLanguage Based on Virtual Human Technologies[J].Journal of Software,2002,13(10):2051-2056.
[62]BROCK H,LAW F,NAKADAI K,et al.Learning Three-di-mensional Skeleton Data from Sign Language Video[J].ACM Transactions on Intelligent Systems and Technology,2020,11(3):1-24.
[63]CUI R,CAO Z,PAN W,et al.Deep Gesture Video Generation with Learning on Regions of Interest[J].IEEE Transactions on Multimedia,2019,PP(99):1-1.
[64]STOLL S,CAMGÖZ N C,HADFIELD S,et al.Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks[C]//British Machine Vision Conference.2018:1-2.
[65]GIRÓ-I-NIETO X.Can Everybody Sign Now? Exploring Sign Language Video Generation from 2D Poses[C]//Sign Language Recognition,Translation & Production.2020:1-4.
[66]KOLLER O,FORSTER J,NEY H.Continuous Sign Language Recognition:Towards Large Vocabulary Statistical Recognition Systems Handling Multiple Signers[J].Computer Vision and Image Understanding,2015,141:108-125.
[67]DREUW P,NEIDLE C,ATHITSOS V,et al.Benchmark Databases for Video-based Automatic Sign Language Recognition[C]//International Conference on Language Resources and Evaluation.2008:1-6.
[68]ADALOGLOU N,CHATZIS T,PAPASTRATIS I,et al.AComprehensive Study on Sign Language Recognition Methods[J].arXiv:2007.12530,2020.
[69]OSZUST M,WYSOCKI M.Polish Sign Language Words Recognition with Kinect[C]//International Conference on Human System Interactions.2013:219-226.
[70]ALIYU S,MOHANDES M,DERICHE M.Dual LMCs Fusion for Recognition of Isolated Arabic Sign Language Words[C]//International Multi-Conference on Systems,Signals & Devices.2017:611-614.
[71]ESCALERA S,BARÓ X,GONZALEZ J,et al.ChaLearn Looking at People Challenge 2014:Dataset and Results[C]//European Conference on Computer Vision.2014:459-473.
[72]YANG S,JUNG S,KANG H,et al.The Korean Sign Language Dataset for Action Recognition[C]//International Conference on Multimedia Modeling.2020:532-542.
[73]RONCHETTI F,QUIROGA F,ESTREBOU C A,et al.LSA64:An Argentinian Sign Language Dataset[C]//Congreso Argentino de Ciencias de la Computación.2016:794-803.
[74]RODRIGUEZ M D,AHMED J,SHAH M.Action MACH aSpatio-temporal Maximum Average Correlation Height Filter for Action Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.2008:1-8.
[75]XU N,LIU A,NIE W,et al.Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition[C]//ACM International Conference on Multimedia.2015:1195-1198.
[76]WEINLAND D,BOYER E,RONFARD R.Action Recognition from Arbitrary Views Using 3D Exemplars[C]//International Conference on Computer Vision.2007:1-7.
[77]CHAI X,LIU Z,LI Y,et al.SignInstructor:An Effective Tool for Sign Language Vocabulary Learning [C]//Asian Conference on Pattern Recognition.2017:900-905.
[78]LIU M T,LEI Y.Chinese Finger Alphabet Flow RecognitionSystem Based on Data Glove[J].Computer Engineering,2011,37(22):168-170,173.
[79]SAVUR C,SAHIN F.American Sign Language RecognitionSystem by Using Surface EMG Signal[C]//IEEE International Conference on Systems,Man,and Cybernetics.2017:2872-2877.
[80]ZHUANG Y,LYU B,SHENG X,et al.Towards Chinese Sign Language Recognition Using Surface Electromyography and Accelerometers[C]//International Conference on Mechatronics and Machine Vision in Practice.2017:1-5.
[81]LIU X,YUAN G,ZHANG Y M,et al.Hand Gesture Recognition Based on Self-adaptive Multi-classifiers Fusion[J].Computer Science,2020,47(7):103-110.
[82]WU J,TIAN Z,SUN L,et al.Real-time American Sign Language Recognition Using Wrist-Worn Motion and Surface EMG Sensors[C]//International Conference on Wearable and Implantable Body Sensor Networks.2015:1-6.
[83]ZHANG J,ZHOU W,XIE C,et al.Chinese Sign Language Recognition with Adaptive HMM[C]//International Conference on Multimedia and Expo.2016:1-6.
[84]CHAI X,WANG H,CHEN X.The Devisign Large Vocabulary of Chinese Sign Language Database and Baseline Evaluations[R].Key Lab of Intelligent Information Processing of CAS,Institute of Computing Technology,Technical Report,2014.
[85]WILBUR R B,KAK A C.Purdue RVL-SLLL American SignLanguage Database[R].School of Electrical and Computer Engineering,Purdue University,Technical Report,2006.
[86]COOPER H,ONG E J,PUGEAULT N,et al.Sign LanguageRecognition Using Sub-units[J].Journal of Machine Learning Research,2012,13(1):2205-2231.
[87]LI D,RODRIGUEZ C,YU X,et al.Word-Level Deep Sign Language Recognition from Video:A New Large-Scale Dataset and Methods Comparison[C]//The IEEE Winter Conference on Applications of Computer Vision.2020:1459-1469.
[88]CARREIRAS M,GUTIÉRREZ-SIGUT E,BAQUERO S,et al.Lexical Processing in Spanish Sign Language (LSE)[J].Journal of Memory and Language,2008,58(1):100-122.
[89]NEIDLE C,THANGALI A,SCLAROFF S.Challenges in De-velopment of the American Sign Language Lexicon Video Dataset (ASLLVD) Corpus[C]//Language Resources and Evaluation Conference Workshop.2012:1-9.
[90]FORSTER J,SCHMIDT C,HOYOUX T,et al.RWTH-PHOENIX-Weather:A Large Vocabulary Sign Language Recognition and Translation Corpus[C]//International Conference on Language Resources and Evaluation.2012:3785-3789.
[91]DUARTE A C.Cross-modal Neural Sign Language Translation[C]//ACM International Conference on Multimedia.2019:1650-1654.
[92]ZHOU H,ZHOU W,LI H.Dynamic Pseudo Label Decoding for Continuous Sign Language Recognition[C]//IEEE International Conference on Multimedia and Expo.2019:1282-1287.
[93]LI D,YU X,XU C,et al.Transferring cross-domain knowledge for video sign language recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.2020:6205-6214.
[94]BILGE Y C,IKIZLER-CINBIS N,CINBIS R G.Zero-shot Sign Language Recognition:Can Textual Data Uncover Sign Languages? [C]//British Machine Vision Conference.2019:1-4.
[1] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[2] ZHANG Hong-bo, DONG Li-jia, PAN Yu-biao, HSIAO Tsung-chih, ZHANG Hui-zhen, DU Ji-xiang. Survey on Action Quality Assessment Methods in Video Understanding [J]. Computer Science, 2022, 49(7): 79-88.
[3] DONG Zhen-heng, REN Wei-ping, YOU Xin-dong, LYU Xue-qiang. Machine Translation Method Integrating New Energy Terminology Knowledge [J]. Computer Science, 2022, 49(6): 305-312.
[4] NING Qiu-yi, SHI Xiao-jing, DUAN Xiang-yu, ZHANG Min. Unsupervised Domain Adaptation Based on Style Aware [J]. Computer Science, 2022, 49(1): 271-278.
[5] LIU Jun-peng, SU Jin-song, HUANG De-gen. Incorporating Language-specific Adapter into Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 17-23.
[6] YU Dong, XIE Wan-ying, GU Shu-hao, FENG Yang. Similarity-based Curriculum Learning for Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 24-30.
[7] HOU Hong-xu, SUN Shuo, WU Nier. Survey of Mongolian-Chinese Neural Machine Translation [J]. Computer Science, 2022, 49(1): 31-40.
[8] LIU Yan, XIONG De-yi. Construction Method of Parallel Corpus for Minority Language Machine Translation [J]. Computer Science, 2022, 49(1): 41-46.
[9] LIU Chuang, XIONG De-yi. Survey of Multilingual Question Answering [J]. Computer Science, 2022, 49(1): 65-72.
[10] LIU Xiao-die. Recognition and Transformation for Complex Noun Phrases Based on Boundary Perception [J]. Computer Science, 2021, 48(6A): 299-305.
[11] ZHOU Xiao-shi, ZHANG Zi-wei, WEN Juan. Natural Language Steganography Based on Neural Machine Translation [J]. Computer Science, 2021, 48(11A): 557-564.
[12] ZHANG Heng, MA Ming-dong, WANG De-yu. Text-Video Feature Learning Based on Clustering Network [J]. Computer Science, 2020, 47(7): 125-129.
[13] QIAO Bo-wen,LI Jun-hui. Neural Machine Translation Combining Source Semantic Roles [J]. Computer Science, 2020, 47(2): 163-168.
[14] JI Ming-xuan, SONG Yu-rong. New Machine Translation Model Based on Logarithmic Position Representation and Self-attention [J]. Computer Science, 2020, 47(11A): 86-91.
[15] WANG Kun, DUAN Xiang-yu. Neural Machine Translation Inclined to Close Neighbor Association [J]. Computer Science, 2019, 46(5): 198-202.
Full text



No Suggested Reading articles found!