基于时序信息对齐的连续手语跨模态知识蒸馏

doi:10.11896/jsjkx.220600036

Computer Science ›› 2022, Vol. 49 ›› Issue (11): 156-162.doi: 10.11896/jsjkx.220600036

• Computer Graphics & Multimedia • Previous Articles Next Articles

Temporal Relation Guided Knowledge Distillation for Continuous Sign Language Recognition

XIAO Zheng-ye¹, LIN Shi-quan¹, WAN Xiu-an¹, FANGYu-chun¹, NI Lan²

1 School of Computer Engineering and Science,Shanghai University,Shanghai 200444,China
2 College of Liberal Arts,Shanghai University,Shanghai 200444,China

Received:2022-06-03 Revised:2022-08-02 Online:2022-11-15 Published:2022-11-03
About author:XIAO Zheng-ye,born in 1996,bachelor.His main research interests include machine learning and computer vision.
FANG Yu-chun,born in 1975,Ph.D,professor.Her main research interests include machine learning,multimedia,pattern recognition and image proces-sing.
Supported by:
National Natural Science Foundation of China(61976132,61991411,U1811461),Natural Science Foundation of Shanghai,China(19ZR1419200)and Shanghai Engineering Research Center of Intelligent Computing System(19DZ2252600).

Abstract

Abstract: Previous researches in continuous sign language recognition mainly focus on the RGB modality and achieve remarkable performance on real-world and laboratory datasets,but they usually require high computation intensity.On the other hand,the skeleton is a modality with small input data and fast computation speed,but poor at the real-world datasets.This paper proposes a cross-modal knowledge distillation method named temporally related knowledge distillation(TRKD) to alleviate the contradiction between RGB and skeleton modality in performance and calculation speed.TRKD utilizes the RGB modality network as a teacher to guide the skeleton modality network for fast and accurate implementation.We notice that the teacher’s understanding of sign language context is worth learning by student.It proposes to employ the graph convolutional network(GCN) to learn and align the temporally related features of teacher networks and student networks to achieve this goal.Moreover,since the supervised information from the teacher network is not available for traditional loss functions due to the learnable parameters of GCN in the teacher network,we introduce contrastive learning to provide self-supervised information.Multiple ablation experiments on the Phoenix-2014 dataset demonstrate the effectiveness of the proposed method.

Key words: Knowledge distillation, Graph convolutional network, Sign language recognition

CLC Number:

TP311

XIAO Zheng-ye, LIN Shi-quan, WAN Xiu-an, FANGYu-chun, NI Lan. Temporal Relation Guided Knowledge Distillation for Continuous Sign Language Recognition[J].Computer Science, 2022, 49(11): 156-162.

References

[1]CUI R P,LIU H,ZHANG C S.Recurrent convolutional neural networks for continuous sign language recognition by staged optimization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7361-7369.
[2]KOLLER O,CAMGOZ N C,NEY H,et al.Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence.2019:2306-2320.
[3]LUO Z L,HSIEH J T,JIANG L,et al.Graph distillation for action detection with privileged modalities[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:166-183.
[4]TIAN Y L,KRISHNAN D, ISOLA P.Contrastive Representation Distillation[C]//International Conference on Learning Representations.2020.
[5]THOMAS N K,MAX W.Semi-supervised classification withgraph convolutional networks[C]//International Conference on Learning Representations.2017.
[6]OORD A V D,LI Y Z,VINYALS O.Representation learning with contrastive predictive coding[J].arXiv:1807.03748,2018.
[7]PU J F,ZHOU W G,LI H Q.Iterative alignment network for continuous sign language recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4165-4174.
[8]CUI R P,LIU H,ZHANG C S.A deep neural framework for continuous sign language recognition by iterative training[J].IEEE Transactions on Multimedia 2019,21(7):1880-1891.
[9]ZHOU H,ZHOU W G,ZHOU Y,et al.Spatial-temporal multi-cue network for continuous sign language recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:13009-13016.
[10]WANG Z C,ZHANG J Q.Continuous Sign Language Recognition based on Multi-Part Skeleton Data[C]//2021 International Joint Conference on Neural Networks(IJCNN).IEEE,2021:1-8.
[11]GARCIA N C,MORERIO P,MURINO V.Modality distillation with multiple stream networks for action recognition[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:103-118.
[12]DAI R,SRIJAN D,BREMOND F.Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:13053-13064.
[13]HINTON G,VINYALS O,DEANY J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015.
[14]OSCAR K,JENS F,HERMANN N.Continuous sign language recognition:Towards large vocabulary statistical recognition systems handling multiple signers[J].Computer Vision and Image Understanding,2015,141:108-125.
[15]YAN S J,XIONG Y J,LIN D H.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Thirty-second AAAI Conference on Artificial Intelligence.2018.
[16]CAO Z,HINDALGO G,SIMONT,et al.OpenPose:realtime multi-person 2D pose estimation using Part Affinity Fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43(1):172-186.
[17]JOAO C,ANDREW Z.Quo vadis,action recognition? a new model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308.

Related Articles 15

[1]	ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[2]	WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[3]	TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[4]	CHU Yu-chun, GONG Hang, Wang Xue-fang, LIU Pei-shun. Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 [J]. Computer Science, 2022, 49(6A): 337-344.
[5]	CHENG Xiang-ming, DENG Chun-hua. Compression Algorithm of Face Recognition Model Based on Unlabeled Knowledge Distillation [J]. Computer Science, 2022, 49(6): 245-253.
[6]	ZHAO Xiao-hu, YE Sheng, LI Xiao. Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction [J]. Computer Science, 2022, 49(6): 269-275.
[7]	ZHOU Hai-yu, ZHANG Dao-qiang. Multi-site Hyper-graph Convolutional Neural Networks and Application [J]. Computer Science, 2022, 49(3): 129-133.
[8]	PAN Zhi-hao, ZENG Bi, LIAO Wen-xiong, WEI Peng-fei, WEN Song. Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification [J]. Computer Science, 2022, 49(3): 294-300.
[9]	MIAO Zhuang, WANG Ya-peng, LI Yang, WANG Jia-bao, ZHANG Rui, ZHAO Xin-xin. Robust Hash Learning Method Based on Dual-teacher Self-supervised Distillation [J]. Computer Science, 2022, 49(10): 159-168.
[10]	HUANG Zhong-hao, YANG Xing-yao, YU Jiong, GUO Liang, LI Xiang. Mutual Learning Knowledge Distillation Based on Multi-stage Multi-generative Adversarial Network [J]. Computer Science, 2022, 49(10): 169-175.
[11]	ZHANG Wei-qi, TANG Yi-feng, LI Lin-yan, HU Fu-yuan. Image Stream From Paragraph Method Based on Scene Graph [J]. Computer Science, 2022, 49(1): 233-240.
[12]	SONG Long-ze, WAN Huai-yu, GUO Sheng-nan, LIN You-fang. Multi-task Spatial-Temporal Graph Convolutional Network for Taxi Idle Time Prediction [J]. Computer Science, 2021, 48(7): 112-117.
[13]	GUO Dan, TANG Shen-geng, HONG Ri-chang, WANG Meng. Review of Sign Language Recognition, Translation and Generation [J]. Computer Science, 2021, 48(3): 60-70.
[14]	RAN Meng-yuan, LIU Li, LI Yan-de, WANG Shan-shan. Deaf Sign Language Recognition Based on Inertial Sensor Fusion Control Algorithm [J]. Computer Science, 2021, 48(2): 231-237.
[15]	LYU Ming-qi, HONG Zhao-xiong, CHEN Tie-ming. Traffic Flow Forecasting Method Combining Spatio-Temporal Correlations and Social Events [J]. Computer Science, 2021, 48(2): 264-270.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Temporal Relation Guided Knowledge Distillation for Continuous Sign Language Recognition

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0