计算机科学 ›› 2023, Vol. 50 ›› Issue (9): 184-191.doi: 10.11896/jsjkx.221100043
黄涵强1,2, 邢云冰2,3, 沈建飞2,3, 范非易2
HUANG Hanqiang1,2, XING Yunbing2,3, SHEN Jianfei2,3, FAN Feiyi2
摘要: 手语动画拼接是一个热门话题。随着机器学习技术的不断发展,尤其是深度学习相关技术的逐渐成熟,手语动画拼接的速度和质量不断提高。将手语单词拼接成句子时,相应的动画也需要拼接。传统的算法在拼接动画时采取距离损失的方式寻找最佳拼接点,使用线性或球面插值的方式生成过渡帧,这种拼接算法不仅在效率和灵活性方面存在明显缺陷,而且生成的过渡帧也不自然。为解决上述问题,提出了LpTransformer模型来预测拼接位置和生成过渡帧。实验表明,LpTransformer的过渡帧预测精度达到99%,优于ConvS2S,LSTM和Transformer模型,且其拼接速度较Transformer快5倍。因此,所提模型能够实现实时性拼接。
中图分类号:
[1]ZHU T T.The research of chinese sign language video synthesis aided by 3D information [D].Beijing:Beijing University of Technology,2014. [2]CHEN J X.Study on key technologies of the chinese sign language synthesis based on the video stitching [D].Hefei:University of Science and Technology of China,2017. [3]ZHAO H N.Chinese sign language news broadcasting system based on virtual human technology [D].Harbin:Harbin Institute of Technology,2008. [4]DUARTE A C.Cross-modal neural sign language translation[C]//Proceedings of the 27th ACM International Conference on Multimedia.Nice:ACM,2019:1650-1654. [5]KAPOOR P,MUKHOPADHYAY R,HEGDE S B,et al.To-wards Automatic Speech to Sign Language Generation[C]//Interspeech 2021,22nd Annual Conference of the International Speech Communication Association.Brno:ISCA,2021:3700-3704. [6]XIAO Q,QIN M,YIN Y.Skeleton-based Chinese sign language recognition and generation for bidirectional communication between deaf and hearing people[J].Neural networks,2020,125:41-55. [7]SAUNDERS B,CAMGOZ N C,BOWDEN R.Progressive transformers for end-to-end sign language production[C]//European Conference on Computer Vision.Glasgow:Springer,2020:687-705. [8]ZELINKA J,KANIS J.Neural sign language synthesis:Words are our glosses[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.Snowmass Village:IEEE,2020:3395-3403. [9]SUNDERS B,CAMGOZ N C,BOWDEN R.Continuous 3dmulti-channel sign language production via progressive transformers and mixture density networks[J].International Journal of Computer Vision,2021,129(7):2113-2135. [10]HUANG W,PAN W,ZHAO Z,et al.Towards Fast and High-Quality Sign Language Production[C]//Proceedings of the 29th ACM International Conference on Multimedia.China:ACM,2021:3172-3181. [11]ZHOU C,LAI Z,WANG S,et al.Learning a deep motion interpolation network for human skeleton animations[J].Computer Animation and Virtual Worlds,2021,32(3/4):e2003. [12]SAUNDERS B,CAMGOZ N C,BOWDEN R.Skeletal Graph Self-Attention:Embedding a Skeleton Inductive Bias into Sign Language Production[J].arXiv:2112.05277,2021. [13]ZHANG Z,XUE W,HUANG W,et al.Effective Video Frame Acquisition for Image Stitching[J].IEEE access,2020,8:217086-217097. [14]LIU Q,SU X,ZHANG L,et al.Panoramic video stitching ofdual cameras based on spatio-temporal seam optimization[J].Multimedia Tools and Applications,2020,79(5):3107-3124. [15]VASUHI S,SAMYDURAI A,VIJAYAKUMAR M.Multica-mera Video Stitching for Multiple Human Tracking[J].International Journalof Computer Vision and Image Processing (IJCVIP),2021,11(1):17-38. [16]CAO W.Applying image registration algorithm combined withCNN model to video image stitching[J].The Journal of Supercomputing,2021,77(12):13879-13896. [17]DAS A,RAUN E S K,KJARGAARD M B.Cam-stitch:Trajectory cavity stitching method for stereo vision cameras in a public building[C]//Proceedings of the First International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things.New York:Association for Computing Machinery,2019:8-14. [18]GEHRING J,AULI M,GRANGIE D,et al.Convolutional sequence to sequence learning[C]//International Conference on Machine Learning.Sydney:PMLR,2017:1243-1252. [19]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017. |
|