计算机科学 ›› 2022, Vol. 49 ›› Issue (9): 155-161.doi: 10.11896/jsjkx.210800026
周乐员1, 张剑华1, 袁甜甜2, 陈胜勇1
ZHOU Le-yuan1, ZHANG Jian-hua1, YUAN Tian-tian2, CHEN Sheng-yong1
摘要: 使计算机能够理解手语者的表达一直是一项极具挑战性的任务,不仅需要考虑手语视频的时间和空间信息,同时还要考虑手语语法的复杂性。在连续手语识别任务中,手语词汇和手语动作共享一致的顺序;而在连续手语翻译任务中,生成的自然语言句子应符合口语化描述,词汇顺序和动作顺序可能不一致。为了能够更加准确地学习手语者的表达,提出了一个新颖的能同时进行手语识别和翻译的深度神经网络。该方案探讨了不同的经典预训练卷积神经网络和不同的多层时序注意力分值函数在连续手语识别上的效果,网络将手语视频高级抽象特征和低级时序语义组合在多层时间注意力融合模块中,形成更全面的序列注意力融合特征,从而从连续手语视频中更准确地生成gloss句子。结合Transformer语言模型将手语识别gloss句子转换为符合手语翻译的连续自然语言句子。首先,该方法在第一个大规模的复杂背景的中国连续手语识别和翻译数据集Tslrt上进行评估。利用Tslrt数据集中手语者复杂的背景环境和丰富的动作表达来训练所提神经网络模型,通过不同的对比实验得到了一系列的基准结果。在连续手语识别和翻译的任务上,效果最好的词错误率分别达到了4.8%和5.1%。为了进一步证明所提方法的有效性,在另一个公开的中国连续手语识别数据集Chinese-CSL也进行了验证,并和其他13种公开方法进行了比较,结果表明,所提方法的词错误率达到了最好的识别效果,为1.8%,证明了该方法的有效性。
中图分类号:
[1]CUI R,LIU H,ZHANG C.Recurrent convolutional neural networks for continuous sign language recognition by staged optimization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7361-7369. [2]VENUGOPALAN S,ROHRBACH M,DONAHUE J,et al.Sequence to sequence-video to text[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4534-4542. [3]ONG S C W,RANGANATH S.Automatic sign language analysis:A survey and the future beyond lexical meaning[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2005,27(6):873-891. [4]THACKER N A,CLARK A F,BARRON J L,et al.Perfor-mance characterization in computer vision:A guide to best practices[J].Computer Vision and Image Understanding,2008,109(3):305-334. [5]BROWN P F,DELLA PIETRA S A,DELLA PIETRA V J,et al.The mathematics of statistical machine translation:Parameter estimation[J].Computational Linguistics,1993,19(2):263-311. [6]MAO J,XU W,YANG Y,et al.Explain images with multimodal recurrent neural networks[J].arXiv:1410.1090,2014. [7]XU K,BA J,KIROS R,et al.Show,attend and tell:Neuralimage caption generation with visual attention[C]//InternationalConference on Machine Learning.PMLR,2015:2048-2057. [8]GUADARRAMA S,KRISHNAMOORTHY N,MALKARNE-NKAR G,et al.Youtube2text:Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:2712-2719. [9]PASUNURU R,BANSAL M.Reinforced video captioning with entailment rewards[J].arXiv:1708.02300,2017. [10]GUO D,TANG S G,HONG R C,et al.A review of sign language recognition,translation and generation[J].Computer Science,2021,48(3):60-70. [11]LIU T,ZHOU W,LI H.Sign language recognition with long short-term memory[C]//2016 IEEE International Conference on Image Processing(ICIP).IEEE,2016:2871-2875. [12]GUO D,ZHOU W,WANG M,et al.Sign language recognition based on adaptive hmms with data augmentation[C]//2016 IEEE International Conference on Image Processing(ICIP).IEEE,2016:2876-2880. [13]YANG H D,LEE S W.Robust sign language recognition with hierarchical conditional random fields[C]//2010 20th International Conference on Pattern Recognition.IEEE,2010:2202-2205. [14]ZHANG J,ZHOU W,LI H.A threshold-based hmm-dtw ap-proach for continuous sign language recognition[C]//Procee-dings of International Conference on Internet Multimedia Computing and Service.2014:237-240. [15]PU J,ZHOU W,LI H.Dilated Convolutional Network withIterative Optimization for Continuous Sign Language Recognition[C]//IJCAI.2018:3-7. [16]CAMGOZ N C,HADFIELD S,KOLLER O,et al.Subunets:End-to-end hand shape and continuous sign language recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:3056-3065. [17]HUANG J,ZHOU W,ZHANG Q,et al.Video-based sign language recognition without temporal segmentation[J].arXiv:1801.10111,2018. [18]CAMGOZ N C,HADFIELD S,KOLLER O,et al.Neural sign language translation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7784-7793. [19]KO S K,KIM C J,JUNG H,et al.Neural sign language translation based on human keypoint estimation[J]. arXiv:1811.11436v2,2019. [20]ZHOU H,ZHOU W,QI W,et al.Improving Sign LanguageTranslation with Monolingual Data by Sign Back-Translation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:1316-1325. [21]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems.2014:3104-3112. [22]CAMGOZ N C,KOLLER O,HADFIELD S,et al.Sign language transformers:Joint end-to-end sign language recognition and translation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10023-10033. [23]YE R,DAI Q.A novel transfer learning framework for time series forecasting[J].Knowledge-Based Systems,2018,156:74-99. [24]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255. [25]SCHUSTER M,PALIWAL K K.Bidirectional recurrent neural networks[J].IEEE Transactions on Signal Processing,1997,45(11):2673-2681. [26]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [27]RONG X.Word2vec parameter learning explained[J].arXiv:1411.2738,2014. [28]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014. [29]LUONG M T,PHAM H,MANNING C D.Effective approaches to attention-based neural machine translation[J].arXiv:1508.04025,2015. [30]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. [31]PU J,ZHOU W,LI H.Iterative alignment network for conti-nuous sign language recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4165-4174. [32]CHENG K L,YANG Z,CHEN Q,et al.Fully convolutionalnetworks for continuous sign language recognition[C]//Euro-pean Conference on Computer Vision.Cham:Springer,2020:697-714. |
[1] | 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙. 基于自然语言的视频片段定位综述 Overview of Natural Language Video Localization 计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130 |
[2] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[3] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[4] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[5] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[6] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[7] | 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥. 视频理解中的动作质量评估方法综述 Survey on Action Quality Assessment Methods in Video Understanding 计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028 |
[8] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[9] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[10] | 刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179 |
[11] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
[12] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
[13] | 吴子斌, 闫巧. 基于动量的映射式梯度下降算法 Projected Gradient Descent Algorithm with Momentum 计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039 |
[14] | 杨涵, 万游, 蔡洁萱, 方铭宇, 吴卓超, 金扬, 钱伟行. 基于步态分类辅助的虚拟IMU的行人导航方法 Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification 计算机科学, 2022, 49(6A): 759-763. https://doi.org/10.11896/jsjkx.211200148 |
[15] | 张嘉淏, 刘峰, 齐佳音. 一种基于Bottleneck Transformer的轻量级微表情识别架构 Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer 计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023 |
|