Computer Science ›› 2022, Vol. 49 ›› Issue (11): 156-162.doi: 10.11896/jsjkx.220600036

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Temporal Relation Guided Knowledge Distillation for Continuous Sign Language Recognition

XIAO Zheng-ye1, LIN Shi-quan1, WAN Xiu-an1, FANGYu-chun1, NI Lan2   

  1. 1 School of Computer Engineering and Science,Shanghai University,Shanghai 200444,China
    2 College of Liberal Arts,Shanghai University,Shanghai 200444,China
  • Received:2022-06-03 Revised:2022-08-02 Online:2022-11-15 Published:2022-11-03
  • About author:XIAO Zheng-ye,born in 1996,bachelor.His main research interests include machine learning and computer vision.
    FANG Yu-chun,born in 1975,Ph.D,professor.Her main research interests include machine learning,multimedia,pattern recognition and image proces-sing.
  • Supported by:
    National Natural Science Foundation of China(61976132,61991411,U1811461),Natural Science Foundation of Shanghai,China(19ZR1419200)and Shanghai Engineering Research Center of Intelligent Computing System(19DZ2252600).

Abstract: Previous researches in continuous sign language recognition mainly focus on the RGB modality and achieve remarkable performance on real-world and laboratory datasets,but they usually require high computation intensity.On the other hand,the skeleton is a modality with small input data and fast computation speed,but poor at the real-world datasets.This paper proposes a cross-modal knowledge distillation method named temporally related knowledge distillation(TRKD) to alleviate the contradiction between RGB and skeleton modality in performance and calculation speed.TRKD utilizes the RGB modality network as a teacher to guide the skeleton modality network for fast and accurate implementation.We notice that the teacher’s understanding of sign language context is worth learning by student.It proposes to employ the graph convolutional network(GCN) to learn and align the temporally related features of teacher networks and student networks to achieve this goal.Moreover,since the supervised information from the teacher network is not available for traditional loss functions due to the learnable parameters of GCN in the teacher network,we introduce contrastive learning to provide self-supervised information.Multiple ablation experiments on the Phoenix-2014 dataset demonstrate the effectiveness of the proposed method.

Key words: Knowledge distillation, Graph convolutional network, Sign language recognition

CLC Number: 

  • TP311
[1]CUI R P,LIU H,ZHANG C S.Recurrent convolutional neural networks for continuous sign language recognition by staged optimization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7361-7369.
[2]KOLLER O,CAMGOZ N C,NEY H,et al.Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence.2019:2306-2320.
[3]LUO Z L,HSIEH J T,JIANG L,et al.Graph distillation for action detection with privileged modalities[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:166-183.
[4]TIAN Y L,KRISHNAN D, ISOLA P.Contrastive Representation Distillation[C]//International Conference on Learning Representations.2020.
[5]THOMAS N K,MAX W.Semi-supervised classification withgraph convolutional networks[C]//International Conference on Learning Representations.2017.
[6]OORD A V D,LI Y Z,VINYALS O.Representation learning with contrastive predictive coding[J].arXiv:1807.03748,2018.
[7]PU J F,ZHOU W G,LI H Q.Iterative alignment network for continuous sign language recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4165-4174.
[8]CUI R P,LIU H,ZHANG C S.A deep neural framework for continuous sign language recognition by iterative training[J].IEEE Transactions on Multimedia 2019,21(7):1880-1891.
[9]ZHOU H,ZHOU W G,ZHOU Y,et al.Spatial-temporal multi-cue network for continuous sign language recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:13009-13016.
[10]WANG Z C,ZHANG J Q.Continuous Sign Language Recognition based on Multi-Part Skeleton Data[C]//2021 International Joint Conference on Neural Networks(IJCNN).IEEE,2021:1-8.
[11]GARCIA N C,MORERIO P,MURINO V.Modality distillation with multiple stream networks for action recognition[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:103-118.
[12]DAI R,SRIJAN D,BREMOND F.Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:13053-13064.
[13]HINTON G,VINYALS O,DEANY J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015.
[14]OSCAR K,JENS F,HERMANN N.Continuous sign language recognition:Towards large vocabulary statistical recognition systems handling multiple signers[J].Computer Vision and Image Understanding,2015,141:108-125.
[15]YAN S J,XIONG Y J,LIN D H.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Thirty-second AAAI Conference on Artificial Intelligence.2018.
[16]CAO Z,HINDALGO G,SIMONT,et al.OpenPose:realtime multi-person 2D pose estimation using Part Affinity Fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43(1):172-186.
[17]JOAO C,ANDREW Z.Quo vadis,action recognition? a new model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308.
[1] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[2] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[3] TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[4] CHU Yu-chun, GONG Hang, Wang Xue-fang, LIU Pei-shun. Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 [J]. Computer Science, 2022, 49(6A): 337-344.
[5] CHENG Xiang-ming, DENG Chun-hua. Compression Algorithm of Face Recognition Model Based on Unlabeled Knowledge Distillation [J]. Computer Science, 2022, 49(6): 245-253.
[6] ZHAO Xiao-hu, YE Sheng, LI Xiao. Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction [J]. Computer Science, 2022, 49(6): 269-275.
[7] ZHOU Hai-yu, ZHANG Dao-qiang. Multi-site Hyper-graph Convolutional Neural Networks and Application [J]. Computer Science, 2022, 49(3): 129-133.
[8] PAN Zhi-hao, ZENG Bi, LIAO Wen-xiong, WEI Peng-fei, WEN Song. Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification [J]. Computer Science, 2022, 49(3): 294-300.
[9] MIAO Zhuang, WANG Ya-peng, LI Yang, WANG Jia-bao, ZHANG Rui, ZHAO Xin-xin. Robust Hash Learning Method Based on Dual-teacher Self-supervised Distillation [J]. Computer Science, 2022, 49(10): 159-168.
[10] HUANG Zhong-hao, YANG Xing-yao, YU Jiong, GUO Liang, LI Xiang. Mutual Learning Knowledge Distillation Based on Multi-stage Multi-generative Adversarial Network [J]. Computer Science, 2022, 49(10): 169-175.
[11] ZHANG Wei-qi, TANG Yi-feng, LI Lin-yan, HU Fu-yuan. Image Stream From Paragraph Method Based on Scene Graph [J]. Computer Science, 2022, 49(1): 233-240.
[12] SONG Long-ze, WAN Huai-yu, GUO Sheng-nan, LIN You-fang. Multi-task Spatial-Temporal Graph Convolutional Network for Taxi Idle Time Prediction [J]. Computer Science, 2021, 48(7): 112-117.
[13] GUO Dan, TANG Shen-geng, HONG Ri-chang, WANG Meng. Review of Sign Language Recognition, Translation and Generation [J]. Computer Science, 2021, 48(3): 60-70.
[14] RAN Meng-yuan, LIU Li, LI Yan-de, WANG Shan-shan. Deaf Sign Language Recognition Based on Inertial Sensor Fusion Control Algorithm [J]. Computer Science, 2021, 48(2): 231-237.
[15] LYU Ming-qi, HONG Zhao-xiong, CHEN Tie-ming. Traffic Flow Forecasting Method Combining Spatio-Temporal Correlations and Social Events [J]. Computer Science, 2021, 48(2): 264-270.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!