计算机科学 ›› 2023, Vol. 50 ›› Issue (9): 184-191.doi: 10.11896/jsjkx.221100043

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于LpTransformer网络的手语动画拼接模型

黄涵强1,2, 邢云冰2,3, 沈建飞2,3, 范非易2   

  1. 1 郑州大学河南先进技术研究院 郑州 450000
    2 中国科学院计算技术研究所 北京 100000
    3 山东产业技术研究院智能计算研究院 济南 250000
  • 收稿日期:2022-11-07 修回日期:2023-02-28 出版日期:2023-09-15 发布日期:2023-09-01
  • 通讯作者: 邢云冰(xingyunbing@ict.ac.cn)
  • 作者简介:(893586949@qq.com)
  • 基金资助:
    国家重点研发计划(2018YFC2002603)

Sign Language Animation Splicing Model Based on LpTransformer Network

HUANG Hanqiang1,2, XING Yunbing2,3, SHEN Jianfei2,3, FAN Feiyi2   

  1. 1 Henan Institute of Advanced Technology,Zhengzhou University,Zhengzhou 450000,China
    2 Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100000,China
    3 Shandong Industrial Technology Research Institute Intelligent Computing Research Institute,Jinan 250000,China
  • Received:2022-11-07 Revised:2023-02-28 Online:2023-09-15 Published:2023-09-01
  • About author:HUANG Hanqiang,born in 1998,postgraduate.His main research interests include graphic image processing and sign language processing.
    XING Yunbing,born in 1982,master,senior engineer.His main research interests include sign language and human-computer interaction.
  • Supported by:
    National Key Research and Development Program of China(2018YFC2002603).

摘要: 手语动画拼接是一个热门话题。随着机器学习技术的不断发展,尤其是深度学习相关技术的逐渐成熟,手语动画拼接的速度和质量不断提高。将手语单词拼接成句子时,相应的动画也需要拼接。传统的算法在拼接动画时采取距离损失的方式寻找最佳拼接点,使用线性或球面插值的方式生成过渡帧,这种拼接算法不仅在效率和灵活性方面存在明显缺陷,而且生成的过渡帧也不自然。为解决上述问题,提出了LpTransformer模型来预测拼接位置和生成过渡帧。实验表明,LpTransformer的过渡帧预测精度达到99%,优于ConvS2S,LSTM和Transformer模型,且其拼接速度较Transformer快5倍。因此,所提模型能够实现实时性拼接。

关键词: 手语动画拼接, 深度学习, LpTransformer, 拼接位置, 过渡帧

Abstract: Sign language animation splicing is a hot topic.With the continuous development of machine learning technology,especially the gradual maturity of deep learning related technologies,the speed and quality of sign language animation splicing are constantly improving.When splicing sign language words into sentences,the corresponding animation also needs to be spliced.Traditional algorithms use distance loss to find the best splicing position when splicing animation,and use linear or spherical interpolation to generate transition frames.This splicing algorithm not only has obvious defects in efficiency and flexibility,but also gene-rates unnatural sign language animation.In order to solve the above problems,LpTransformer model is proposed to predict the splicing position and generate transition frames.Experiment results show that the prediction accuracy of LpTransformer's transition frames reaches 99%,which is superior to ConvS2S,LSTM and Transformer,and its splicing speed is five times faster than Transformer,so it can achieve real-time splicing.

Key words: Sign language animation splicing, Deep learning, LpTransformer, Splicing position, Transition frames

中图分类号: 

  • TP183
[1]ZHU T T.The research of chinese sign language video synthesis aided by 3D information [D].Beijing:Beijing University of Technology,2014.
[2]CHEN J X.Study on key technologies of the chinese sign language synthesis based on the video stitching [D].Hefei:University of Science and Technology of China,2017.
[3]ZHAO H N.Chinese sign language news broadcasting system based on virtual human technology [D].Harbin:Harbin Institute of Technology,2008.
[4]DUARTE A C.Cross-modal neural sign language translation[C]//Proceedings of the 27th ACM International Conference on Multimedia.Nice:ACM,2019:1650-1654.
[5]KAPOOR P,MUKHOPADHYAY R,HEGDE S B,et al.To-wards Automatic Speech to Sign Language Generation[C]//Interspeech 2021,22nd Annual Conference of the International Speech Communication Association.Brno:ISCA,2021:3700-3704.
[6]XIAO Q,QIN M,YIN Y.Skeleton-based Chinese sign language recognition and generation for bidirectional communication between deaf and hearing people[J].Neural networks,2020,125:41-55.
[7]SAUNDERS B,CAMGOZ N C,BOWDEN R.Progressive transformers for end-to-end sign language production[C]//European Conference on Computer Vision.Glasgow:Springer,2020:687-705.
[8]ZELINKA J,KANIS J.Neural sign language synthesis:Words are our glosses[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.Snowmass Village:IEEE,2020:3395-3403.
[9]SUNDERS B,CAMGOZ N C,BOWDEN R.Continuous 3dmulti-channel sign language production via progressive transformers and mixture density networks[J].International Journal of Computer Vision,2021,129(7):2113-2135.
[10]HUANG W,PAN W,ZHAO Z,et al.Towards Fast and High-Quality Sign Language Production[C]//Proceedings of the 29th ACM International Conference on Multimedia.China:ACM,2021:3172-3181.
[11]ZHOU C,LAI Z,WANG S,et al.Learning a deep motion interpolation network for human skeleton animations[J].Computer Animation and Virtual Worlds,2021,32(3/4):e2003.
[12]SAUNDERS B,CAMGOZ N C,BOWDEN R.Skeletal Graph Self-Attention:Embedding a Skeleton Inductive Bias into Sign Language Production[J].arXiv:2112.05277,2021.
[13]ZHANG Z,XUE W,HUANG W,et al.Effective Video Frame Acquisition for Image Stitching[J].IEEE access,2020,8:217086-217097.
[14]LIU Q,SU X,ZHANG L,et al.Panoramic video stitching ofdual cameras based on spatio-temporal seam optimization[J].Multimedia Tools and Applications,2020,79(5):3107-3124.
[15]VASUHI S,SAMYDURAI A,VIJAYAKUMAR M.Multica-mera Video Stitching for Multiple Human Tracking[J].International Journalof Computer Vision and Image Processing (IJCVIP),2021,11(1):17-38.
[16]CAO W.Applying image registration algorithm combined withCNN model to video image stitching[J].The Journal of Supercomputing,2021,77(12):13879-13896.
[17]DAS A,RAUN E S K,KJARGAARD M B.Cam-stitch:Trajectory cavity stitching method for stereo vision cameras in a public building[C]//Proceedings of the First International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things.New York:Association for Computing Machinery,2019:8-14.
[18]GEHRING J,AULI M,GRANGIE D,et al.Convolutional sequence to sequence learning[C]//International Conference on Machine Learning.Sydney:PMLR,2017:1243-1252.
[19]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!