计算机科学 ›› 2021, Vol. 48 ›› Issue (3): 60-70.doi: 10.11896/jsjkx.210100227
所属专题: 多媒体技术进展
郭丹, 唐申庚, 洪日昌, 汪萌
GUO Dan, TANG Shen-geng, HONG Ri-chang, WANG Meng
摘要: 手语研究是典型的多领域交叉研究课题,涉及计算机视觉、自然语言处理、跨媒体计算、人机交互等多个方向,主要包括离散手语识别、连续手语翻译和手语视频生成。手语识别与翻译旨在将手语视频转换成文本词汇或语句,而手语生成是根据口语或文本语句合成手语视频。换言之,手语识别翻译与手语生成可视为互逆过程。文中综述了手语研究的最新进展,介绍了研究的背景现状和面临的挑战;回顾了手语识别、翻译和生成任务的典型方法和前沿研究;并结合当前方法中存在的问题,对手语研究的未来发展方向进行了展望。
中图分类号:
[1]WANG H,CHAI X,CHEN X.A Novel Sign Language Recognition Framework Using Hierarchical Grassmann Covariance Matrix[J].IEEE Transactions on Multimedia,2019,21(11):2806-2814. [2]GUO D,ZHOU W,LI H,et al.Online Early-Late Fusion Based on Adaptive HMM for Sign Language Recognition[J].ACM Transactions on Multimedia Computing Communications and Applications,2018,14(1):1-18. [3]YIN F,CHAI X,CHEN X.Iterative Reference Driven MetricLearning for Signer Independent Isolated Sign Language Recognition[C]//European Conference on Computer Vision.Sprin-ger,Cham,2016:434-450. [4]WANG Q,CHEN X L,WANG C L,et al.A Data-Deficiency-Tolerated Method for Viewpoint Independent Sign Language Recognition[J].Chinese Journal of Computers,2009,32(5):953-961. [5]YUAN T,SAH S,ANANTHANARAYANA T,et al.LargeScale Sign Language Interpretation[C]//IEEE International Conference on Automatic Face & Gesture Recognition.IEEE,2019:1-5. [6]KUSHWAH M S,SHARMA M,JAIN K,et al.Sign language interpretation using pseudo glove[C]//International Conference on Intelligent Communication,Control and Devices.Singapore:Springer,2017:9-18. [7]PU J,ZHOU W,LI H.Iterative Alignment Network for Continuous Sign Language Recognition[C]//Computer Vision and Pattern Recognition.2019:4165-4174. [8]ZHOU M,NG M,CAI Z,et al.Self-Attention-based Fully-In-ception Networks for Continuous Sign Language Recognition[C]//European Conference on Artificial Intelligence.2020:8. [9]PU J,ZHOU W,LI H.Dilated Convolutional Network with Iterative Optimization for Continuous Sign Language Recognition[C]//International Joint Conference on Artificial Intelligence.2018:885-891. [10]ZHOU H,ZHOU W,ZHOU Y,et al.Spatial-Temporal Multi-cue Network for Continuous Sign Language Recognition[C]//AAAI Conference on Artificial Intelligence.2020:13009-13016. [11]CIHAN C N,HADFIELD S,KOLLER O,et al.Neural sign language translation [C]//Computer Vision and Pattern Recognition.2018:7784-7793. [12]GUO D,TANG S,WANG M.Connectionist Temporal Modeling of Video and Language:A Joint Model for Translation and Sign Labeling[C]//International Joint Conference on Artificial Intelligence.2019:751-757. [13]GUO D,ZHOU W,LI H,et al.Hierarchical LSTM for SignLanguage Translation[C]//AAAI Conference on Artificial Intelligence.2018:6845-6852. [14]GUO D,WANG S,TIAN Q,et al.Dense Temporal Convolution Network for Sign Language Translation[C]//International Joint Conference on Artificial Intelligence.2019:744-750. [15]WANG S,GUO D,ZHOU W,et al.Connectionist Temporal Fusion for Sign Language Translation[C]//ACM International Conference on Multimedia.2018:1483-1491. [16]SAGAWA H,TAKEUCHI M.A Teaching System of Japanese Sign Language Using Sign Language Recognition and Generation[C]//ACM International Conference on Multimedia.2002:137-145. [17]XIAO Q,QIN M,YIN Y.Skeleton-Based Chinese Sign Lan-guage Recognition and Generation for Bidirectional Communication between Deaf and Hearing People[J].Neural Networks,2020,125:41-55. [18]SAUNDERS B,CAMGÖZ N C,BOWDEN R.AdversarialTraining for Multi-Channel Sign Language Production[C]//British Machine Vision Conference.2020:1-15. [19]STOLL S,CAMGOZ N C,HADFIELD S,et al.Text2Sign:Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks[J].International Journal of Computer Vision,2020:128(4):891-908. [20]SAUNDERS B,CAMGOZ N C,BOWDEN R.ProgressiveTransformers for End-to-end Sign Language Production[C]//European Conference on Computer Vision.2020:687-705. [21]STOLL S,HADFIELD S,BOWDEN R.SignSynth:Data-driven Sign Language Video Generation[C]//Assistive Computer Vision and Robotics.2020:353-370. [22]KARPOUZIS K,CARIDAKIS G,FOTINEA S E,et al.Educational Resources and Implementation of A Greek Sign Language Synthesis Architecture [J].Computers & Education,2007,49(1):54-74. [23]ZELINKA J,KANIS J.Neural Sign Language Synthesis:Words Are Our Glosses[C]//IEEE Winter Conference on Applications of Computer Vision.2020:3395-3403. [24]ZELINKA J,KANIS J,SALAJKA P.NN-based Czech SignLanguage Synthesis[C]//International Conference on Speech and Computer.Springer,Cham,2019:559-568. [25]ZHENG L,LIANG B.Sign Language Recognition Using Depth Images[C]//International Conference on Control,Automation,Robotics and Vision.2016:1-6. [26]OLIVEIRA M,SUTHERLAND A,FAROUK M.Two-stagePCA with Interpolated Data for Hand Shape Recognition in Sign Language[C]//IEEE Applied Imagery Pattern Recognition Workshop.2016:1-4. [27]HASSAN M,ASSALEH K,SHANABLEH T.User-dependent Sign Language Recognition Using Motion Detection[C]//International Conference on Computational Science and Computational Intelligence.2016:852-856. [28]LIN Y,CHAI X,ZHOU Y,et al.Curve Matching from the View of Manifold for Sign Language Recognition[C]//Asian Conference on Computer Vision.2014:233-246. [29]MIAO Y W,LI J Y,LIU J Z,et al.Hand Gesture Recognition Based on Joint Rotation Feature and Fingertip Distance Feature[J].Chinese Journal of Computers,2020,43(1):78-92. [30]YIN F,CHAI X,ZHOU Y,et al.Semantics Constrained Dic-tionary Learning for Signer-Independent Sign Language Recognition[C]//IEEE International Conference on Image Processing.2015:3310-3314. [31]PU J,ZHOU W,LI H.Sign Language Recognition with Multi-modal Features[C]//Pacific Rim Conference on Multimedia.2016:252-261. [32]LI Y,MIAO Q,TIAN K,et al.Large-scale Gesture Recognition with A Fusion of RGB-D Data Based on The C3D Model[C]//International Conference on Pattern Recognition.2016:25-30. [33]THANG P Q,THUY N T,LAM H T.The SVM,SimpSVMand RVM on sign language recognition problem[C]//IEEE International Conference on Information Science and Technology.2017:398-403. [34]HUANG J,ZHOU W,ZHANG Q,et al.Video-based Sign Language Recognition without Temporal Segmentation[C]//AAAI Conference on Artificial Intelligence.2018:2257-2264. [35]AHMED W,CHANDA K,MITRA S.Vision Based Hand Gesture Recognition Using Dynamic Time Warping for Indian Sign Language[C]//IEEE International Conference on Information Science.2016:120-125. [36]FANG G L,GAO W,CHEN X L,et al.A Signer-Independent Continuous Sign Language Recognition System Based on SRN/HMM FANG[J].Journal of Software,2002(11):2169-2175. [37]TORNAY S,RAZAVI M,DOSS M M.Towards Multilingual Sign Language Recognition[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2020:6309-6313. [38]HUANG J,ZHOU W,LI H,et al.Attention-Based 3D-CNNs for Large-Vocabulary Sign Language Recognition[J].IEEE Transactions on Circuits and Systems for Video Technology,2018,29(9):2822-2832. [39]SZEGEDY C,LIU W,JIA Y,et al.Going Deeper with Convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9. [40]PIGOU L,DIELEMAN S,KINDERMANS P J,et al.Sign Language Recognition Using Convolutional Neural Networks[C]//European Conference on Computer Vision.2014:572-578. [41]HUANG J,ZHOU W,LI H,et al.Sign Language Recognition Using 3D Convolutional Neural Networks[C]//International Conference on Multimedia and Expo.2015:1-6. [42]LI H,GAO L,HAN R,et al.Key Action and Joint CTC-Attention based Sign Language Recognition[C]//IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2020:2348-2352. [43]NIU Z,MAK B.Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition[C]//European Conference on Computer Vision.2020:172-186. [44]CAMGOZ N C,KOLLER O,HADFIELD S,et al.Multi-channel transformers for multi-articulatory sign language translation[C]//European Conference on Computer Vision.Springer,Cham,2020:301-319. [45]CAMGOZ N C,KOLLER O,HADFIELD S,et al.Sign Lan-guage Transformers:Joint End-to-end Sign Language Recognition and Translation[C]//IEEE Conference on Computer Vision and Pattern Recognition.2020:10023-10033. [46]ZHANG Z,PU J,ZHUANG L,et al.Continuous Sign Language Recognition via Reinforcement Learning[C]//IEEE International Conference on Image Processing.2019:285-289. [47]DE AMORIM C C,MACÊDO D,ZANCHETTIN C.Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition[C]//International Conference on Artificial Neural Networks.Springer,Cham,2019:646-657. [48]TUNGA A,NUTHALAPATI S V,WACHS J.Pose-based Sign Language Recognition using GCN and BERT[C]//IEEE Winter Conference on Applications of Computer Vision.2020:31-40. [49]KOLLER O,ZARGARAN S,NEY H,et al.Deep Sign:EnablingRobust Statistical Continuous Sign Language Recognition Via Hybrid CNN-HMMs[J].International Journal of Computer Vision,2018,126(12):1311-1325. [50]KOLLER O,ZARGARAN S,NEY H.Re-sign:Re-aligned End-to-end Sequence Modelling with Deep Recurrent CNN-HMMs[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:4297-4305. [51]SONG P,GUO D,XIN H,et al.Parallel Temporal Encoder for Sign Language Translation[C]//IEEE International Conference on Image Processing.IEEE,2019:1915-1919. [52]YANG Q,PENG J Y.Chinese Sign Language RecognitionMethod Based on Depth Image Information and SURF-BoW[J].Pattern Recognition and Artificial Intelligence,2014,27(8):741-749. [53]WU D,PIGOU L,KINDERMANS P J,et al.Deep DynamicNeural Networks for Multimodal Gesture Segmentation and Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(8):1583-1597. [54]YUAN Q,WAN J,LIN C,et al.Global and Local Spatial-Attention Network for Isolated Gesture Recognition[C]//Chinese Conference on Biometric Recognition.Springer,Cham,2019:84-93. [55]CUI R,LIU H,ZHANG C.A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training[J].IEEE Transactions on Multimedia,2019,21(7):1880-1891. [56]CUI R,LIU H,ZHANG C.Recurrent Convolutional NeuralNetworks for Continuous Sign Language Recognition by Staged Optimization[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:7361-7369. [57]KOLLER O,NEY H,BOWDEN R.Deep hand:How To Train A CNN on 1 Million Hand Images When Your Data Is Continuous and Weakly Labelled[C]//IEEE Conference on Computer Vision and Pattern Recognition.2016:3793-3802. [58]YIN F,CHAI X J,ZHOU Y,et al.Weakly Supervised Metric Learning towards Signer Adaptation for Sign Language Recognition[C]//British Machine Vision Association.2015:1-12. [59]HAZRA S,SANTRA A.Short-range radar-based gesture recognition system using 3D CNN with triplet loss[J].IEEE Access,2019,7:125623-125633. [60]GLAUERT J R W,ELLIOTT R,COX S J,et al.Vanessa–A System for Communication between Deaf and Hearing People[J].Technology and Disability,2006,18(4):207-216. [61]WANG Z Q,GAO W.A Method to Synthesize Chinese SignLanguage Based on Virtual Human Technologies[J].Journal of Software,2002,13(10):2051-2056. [62]BROCK H,LAW F,NAKADAI K,et al.Learning Three-di-mensional Skeleton Data from Sign Language Video[J].ACM Transactions on Intelligent Systems and Technology,2020,11(3):1-24. [63]CUI R,CAO Z,PAN W,et al.Deep Gesture Video Generation with Learning on Regions of Interest[J].IEEE Transactions on Multimedia,2019,PP(99):1-1. [64]STOLL S,CAMGÖZ N C,HADFIELD S,et al.Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks[C]//British Machine Vision Conference.2018:1-2. [65]GIRÓ-I-NIETO X.Can Everybody Sign Now? Exploring Sign Language Video Generation from 2D Poses[C]//Sign Language Recognition,Translation & Production.2020:1-4. [66]KOLLER O,FORSTER J,NEY H.Continuous Sign Language Recognition:Towards Large Vocabulary Statistical Recognition Systems Handling Multiple Signers[J].Computer Vision and Image Understanding,2015,141:108-125. [67]DREUW P,NEIDLE C,ATHITSOS V,et al.Benchmark Databases for Video-based Automatic Sign Language Recognition[C]//International Conference on Language Resources and Evaluation.2008:1-6. [68]ADALOGLOU N,CHATZIS T,PAPASTRATIS I,et al.AComprehensive Study on Sign Language Recognition Methods[J].arXiv:2007.12530,2020. [69]OSZUST M,WYSOCKI M.Polish Sign Language Words Recognition with Kinect[C]//International Conference on Human System Interactions.2013:219-226. [70]ALIYU S,MOHANDES M,DERICHE M.Dual LMCs Fusion for Recognition of Isolated Arabic Sign Language Words[C]//International Multi-Conference on Systems,Signals & Devices.2017:611-614. [71]ESCALERA S,BARÓ X,GONZALEZ J,et al.ChaLearn Looking at People Challenge 2014:Dataset and Results[C]//European Conference on Computer Vision.2014:459-473. [72]YANG S,JUNG S,KANG H,et al.The Korean Sign Language Dataset for Action Recognition[C]//International Conference on Multimedia Modeling.2020:532-542. [73]RONCHETTI F,QUIROGA F,ESTREBOU C A,et al.LSA64:An Argentinian Sign Language Dataset[C]//Congreso Argentino de Ciencias de la Computación.2016:794-803. [74]RODRIGUEZ M D,AHMED J,SHAH M.Action MACH aSpatio-temporal Maximum Average Correlation Height Filter for Action Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.2008:1-8. [75]XU N,LIU A,NIE W,et al.Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition[C]//ACM International Conference on Multimedia.2015:1195-1198. [76]WEINLAND D,BOYER E,RONFARD R.Action Recognition from Arbitrary Views Using 3D Exemplars[C]//International Conference on Computer Vision.2007:1-7. [77]CHAI X,LIU Z,LI Y,et al.SignInstructor:An Effective Tool for Sign Language Vocabulary Learning [C]//Asian Conference on Pattern Recognition.2017:900-905. [78]LIU M T,LEI Y.Chinese Finger Alphabet Flow RecognitionSystem Based on Data Glove[J].Computer Engineering,2011,37(22):168-170,173. [79]SAVUR C,SAHIN F.American Sign Language RecognitionSystem by Using Surface EMG Signal[C]//IEEE International Conference on Systems,Man,and Cybernetics.2017:2872-2877. [80]ZHUANG Y,LYU B,SHENG X,et al.Towards Chinese Sign Language Recognition Using Surface Electromyography and Accelerometers[C]//International Conference on Mechatronics and Machine Vision in Practice.2017:1-5. [81]LIU X,YUAN G,ZHANG Y M,et al.Hand Gesture Recognition Based on Self-adaptive Multi-classifiers Fusion[J].Computer Science,2020,47(7):103-110. [82]WU J,TIAN Z,SUN L,et al.Real-time American Sign Language Recognition Using Wrist-Worn Motion and Surface EMG Sensors[C]//International Conference on Wearable and Implantable Body Sensor Networks.2015:1-6. [83]ZHANG J,ZHOU W,XIE C,et al.Chinese Sign Language Recognition with Adaptive HMM[C]//International Conference on Multimedia and Expo.2016:1-6. [84]CHAI X,WANG H,CHEN X.The Devisign Large Vocabulary of Chinese Sign Language Database and Baseline Evaluations[R].Key Lab of Intelligent Information Processing of CAS,Institute of Computing Technology,Technical Report,2014. [85]WILBUR R B,KAK A C.Purdue RVL-SLLL American SignLanguage Database[R].School of Electrical and Computer Engineering,Purdue University,Technical Report,2006. [86]COOPER H,ONG E J,PUGEAULT N,et al.Sign LanguageRecognition Using Sub-units[J].Journal of Machine Learning Research,2012,13(1):2205-2231. [87]LI D,RODRIGUEZ C,YU X,et al.Word-Level Deep Sign Language Recognition from Video:A New Large-Scale Dataset and Methods Comparison[C]//The IEEE Winter Conference on Applications of Computer Vision.2020:1459-1469. [88]CARREIRAS M,GUTIÉRREZ-SIGUT E,BAQUERO S,et al.Lexical Processing in Spanish Sign Language (LSE)[J].Journal of Memory and Language,2008,58(1):100-122. [89]NEIDLE C,THANGALI A,SCLAROFF S.Challenges in De-velopment of the American Sign Language Lexicon Video Dataset (ASLLVD) Corpus[C]//Language Resources and Evaluation Conference Workshop.2012:1-9. [90]FORSTER J,SCHMIDT C,HOYOUX T,et al.RWTH-PHOENIX-Weather:A Large Vocabulary Sign Language Recognition and Translation Corpus[C]//International Conference on Language Resources and Evaluation.2012:3785-3789. [91]DUARTE A C.Cross-modal Neural Sign Language Translation[C]//ACM International Conference on Multimedia.2019:1650-1654. [92]ZHOU H,ZHOU W,LI H.Dynamic Pseudo Label Decoding for Continuous Sign Language Recognition[C]//IEEE International Conference on Multimedia and Expo.2019:1282-1287. [93]LI D,YU X,XU C,et al.Transferring cross-domain knowledge for video sign language recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.2020:6205-6214. [94]BILGE Y C,IKIZLER-CINBIS N,CINBIS R G.Zero-shot Sign Language Recognition:Can Textual Data Uncover Sign Languages? [C]//British Machine Vision Conference.2019:1-4. |
[1] | 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙. 基于自然语言的视频片段定位综述 Overview of Natural Language Video Localization 计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130 |
[2] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[3] | 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥. 视频理解中的动作质量评估方法综述 Survey on Action Quality Assessment Methods in Video Understanding 计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028 |
[4] | 董振恒, 任维平, 游新冬, 吕学强. 融入新能源领域术语知识的机器翻译方法 Machine Translation Method Integrating New Energy Terminology Knowledge 计算机科学, 2022, 49(6): 305-312. https://doi.org/10.11896/jsjkx.210500117 |
[5] | 宁秋怡, 史小静, 段湘煜, 张民. 基于风格感知的无监督领域适应算法 Unsupervised Domain Adaptation Based on Style Aware 计算机科学, 2022, 49(1): 271-278. https://doi.org/10.11896/jsjkx.201200094 |
[6] | 刘俊鹏, 苏劲松, 黄德根. 融合特定语言适配模块的多语言神经机器翻译 Incorporating Language-specific Adapter into Multilingual Neural Machine Translation 计算机科学, 2022, 49(1): 17-23. https://doi.org/10.11896/jsjkx.210900005 |
[7] | 于东, 谢婉莹, 谷舒豪, 冯洋. 基于语种关联度课程学习的多语言神经机器翻译 Similarity-based Curriculum Learning for Multilingual Neural Machine Translation 计算机科学, 2022, 49(1): 24-30. https://doi.org/10.11896/jsjkx.210800254 |
[8] | 侯宏旭, 孙硕, 乌尼尔. 蒙汉神经机器翻译研究综述 Survey of Mongolian-Chinese Neural Machine Translation 计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006 |
[9] | 刘妍, 熊德意. 面向小语种机器翻译的平行语料库构建方法 Construction Method of Parallel Corpus for Minority Language Machine Translation 计算机科学, 2022, 49(1): 41-46. https://doi.org/10.11896/jsjkx.210900012 |
[10] | 刘创, 熊德意. 多语言问答研究综述 Survey of Multilingual Question Answering 计算机科学, 2022, 49(1): 65-72. https://doi.org/10.11896/jsjkx.210900003 |
[11] | 刘小蝶. 基于边界感知的复杂名词短语的识别和转换研究 Recognition and Transformation for Complex Noun Phrases Based on Boundary Perception 计算机科学, 2021, 48(6A): 299-305. https://doi.org/10.11896/jsjkx.200500157 |
[12] | 周小诗, 张梓葳, 文娟. 基于神经网络机器翻译的自然语言信息隐藏 Natural Language Steganography Based on Neural Machine Translation 计算机科学, 2021, 48(11A): 557-564. https://doi.org/10.11896/jsjkx.210100015 |
[13] | 张衡, 马明栋, 王得玉. 基于聚类网络的文本-视频特征学习 Text-Video Feature Learning Based on Clustering Network 计算机科学, 2020, 47(7): 125-129. https://doi.org/10.11896/jsjkx.190700006 |
[14] | 乔博文,李军辉. 融合语义角色的神经机器翻译 Neural Machine Translation Combining Source Semantic Roles 计算机科学, 2020, 47(2): 163-168. https://doi.org/10.11896/jsjkx.190100048 |
[15] | 纪明轩, 宋玉蓉. 一种基于对数位置表示和自注意力的机器翻译新模型 New Machine Translation Model Based on Logarithmic Position Representation and Self-attention 计算机科学, 2020, 47(11A): 86-91. https://doi.org/10.11896/jsjkx.200200003 |
|