计算机科学 ›› 2022, Vol. 49 ›› Issue (11): 156-162.doi: 10.11896/jsjkx.220600036
肖正业1, 林世铨1, 万修安1, 方昱春1, 倪兰2
XIAO Zheng-ye1, LIN Shi-quan1, WAN Xiu-an1, FANGYu-chun1, NI Lan2
摘要: 近年来,连续手语识别的研究工作主要围绕RGB模态的数据展开,并且在现实场景数据集和实验室采集数据集上都取得了显著进展。然而,RGB模态的处理对设备计算能力具有很高的要求,而骨骼关键点模态则由于输入数据复杂度相对低,因此处理速度更快,只是在识别性能上弱于RGB模态。为了综合两种方法的优点,文中提出了一种基于时序关联信息对齐的跨模态知识蒸馏方法(Temporally Related Knowledge Distillation,TRKD)。该方法使用RGB模态的神经网络作为教师网络来指导使用骨骼关键点模态的学生网络,以快速准确地实现连续手语识别。由于教师网络对手语语境的理解能力十分值得学生网络学习,因此提出了具有先验信息以及自适应学习方法的图卷积网络来提取两类模态中的时序关联特征,并通过特征对齐来实现教学。在特征对齐过程中,在教师网络中引入可学习参数会导致教师提供的监督信息丢失。为了解决这个问题,所提出的TRKD方法引入了自监督学习中的对比学习来提供监督信息,从而实现了教师网络与学生网络在时序关联特征上的对齐。文中在Phoenix-2014手语数据集上组织了多项蒸馏任务,以验证所提方法的有效性。
中图分类号:
[1]CUI R P,LIU H,ZHANG C S.Recurrent convolutional neural networks for continuous sign language recognition by staged optimization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7361-7369. [2]KOLLER O,CAMGOZ N C,NEY H,et al.Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence.2019:2306-2320. [3]LUO Z L,HSIEH J T,JIANG L,et al.Graph distillation for action detection with privileged modalities[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:166-183. [4]TIAN Y L,KRISHNAN D, ISOLA P.Contrastive Representation Distillation[C]//International Conference on Learning Representations.2020. [5]THOMAS N K,MAX W.Semi-supervised classification withgraph convolutional networks[C]//International Conference on Learning Representations.2017. [6]OORD A V D,LI Y Z,VINYALS O.Representation learning with contrastive predictive coding[J].arXiv:1807.03748,2018. [7]PU J F,ZHOU W G,LI H Q.Iterative alignment network for continuous sign language recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4165-4174. [8]CUI R P,LIU H,ZHANG C S.A deep neural framework for continuous sign language recognition by iterative training[J].IEEE Transactions on Multimedia 2019,21(7):1880-1891. [9]ZHOU H,ZHOU W G,ZHOU Y,et al.Spatial-temporal multi-cue network for continuous sign language recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:13009-13016. [10]WANG Z C,ZHANG J Q.Continuous Sign Language Recognition based on Multi-Part Skeleton Data[C]//2021 International Joint Conference on Neural Networks(IJCNN).IEEE,2021:1-8. [11]GARCIA N C,MORERIO P,MURINO V.Modality distillation with multiple stream networks for action recognition[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:103-118. [12]DAI R,SRIJAN D,BREMOND F.Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:13053-13064. [13]HINTON G,VINYALS O,DEANY J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015. [14]OSCAR K,JENS F,HERMANN N.Continuous sign language recognition:Towards large vocabulary statistical recognition systems handling multiple signers[J].Computer Vision and Image Understanding,2015,141:108-125. [15]YAN S J,XIONG Y J,LIN D H.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Thirty-second AAAI Conference on Artificial Intelligence.2018. [16]CAO Z,HINDALGO G,SIMONT,et al.OpenPose:realtime multi-person 2D pose estimation using Part Affinity Fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43(1):172-186. [17]JOAO C,ANDREW Z.Quo vadis,action recognition? a new model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308. |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 |
[3] | 李健智, 王红玲, 王中卿. 基于图卷积网络的专利摘要自动生成研究 Automatic Generation of Patent Summarization Based on Graph Convolution Network 计算机科学, 2022, 49(6A): 172-177. https://doi.org/10.11896/jsjkx.210400117 |
[4] | 楚玉春, 龚航, 王学芳, 刘培顺. 基于YOLOv4的目标检测知识蒸馏算法研究 Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 计算机科学, 2022, 49(6A): 337-344. https://doi.org/10.11896/jsjkx.210600204 |
[5] | 程祥鸣, 邓春华. 基于无标签知识蒸馏的人脸识别模型的压缩算法 Compression Algorithm of Face Recognition Model Based on Unlabeled Knowledge Distillation 计算机科学, 2022, 49(6): 245-253. https://doi.org/10.11896/jsjkx.210400023 |
[6] | 赵小虎, 叶圣, 李晓. 多算法融合的骨骼重建信息动作分类方法 Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction 计算机科学, 2022, 49(6): 269-275. https://doi.org/10.11896/jsjkx.210500070 |
[7] | 周海榆, 张道强. 面向多中心数据的超图卷积神经网络及应用 Multi-site Hyper-graph Convolutional Neural Networks and Application 计算机科学, 2022, 49(3): 129-133. https://doi.org/10.11896/jsjkx.201100152 |
[8] | 潘志豪, 曾碧, 廖文雄, 魏鹏飞, 文松. 基于交互注意力图卷积网络的方面情感分类 Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification 计算机科学, 2022, 49(3): 294-300. https://doi.org/10.11896/jsjkx.210100180 |
[9] | 解宇, 杨瑞玲, 刘公绪, 李德玉, 王文剑. 基于动态拓扑图的人体骨架动作识别算法 Human Skeleton Action Recognition Algorithm Based on Dynamic Topological Graph 计算机科学, 2022, 49(2): 62-68. https://doi.org/10.11896/jsjkx.210900059 |
[10] | 苗壮, 王亚鹏, 李阳, 王家宝, 张睿, 赵昕昕. 一种鲁棒的双教师自监督蒸馏哈希学习方法 Robust Hash Learning Method Based on Dual-teacher Self-supervised Distillation 计算机科学, 2022, 49(10): 159-168. https://doi.org/10.11896/jsjkx.210800050 |
[11] | 黄仲浩, 杨兴耀, 于炯, 郭亮, 李想. 基于多阶段多生成对抗网络的互学习知识蒸馏方法 Mutual Learning Knowledge Distillation Based on Multi-stage Multi-generative Adversarial Network 计算机科学, 2022, 49(10): 169-175. https://doi.org/10.11896/jsjkx.210800250 |
[12] | 宋龙泽, 万怀宇, 郭晟楠, 林友芳. 面向出租车空载时间预测的多任务时空图卷积网络 Multi-task Spatial-Temporal Graph Convolutional Network for Taxi Idle Time Prediction 计算机科学, 2021, 48(7): 112-117. https://doi.org/10.11896/jsjkx.201000089 |
[13] | 程思伟, 葛唯益, 王羽, 徐建. BGCN:基于BERT和图卷积网络的触发词检测 BGCN:Trigger Detection Based on BERT and Graph Convolution Network 计算机科学, 2021, 48(7): 292-298. https://doi.org/10.11896/jsjkx.200500133 |
[14] | 宋元隆, 吕光宏, 王桂芝, 贾吾财. 基于图卷积神经网络的SDN网络流量预测 SDN Traffic Prediction Based on Graph Convolutional Network 计算机科学, 2021, 48(6A): 392-397. https://doi.org/10.11896/jsjkx.200800090 |
[15] | 郭丹, 唐申庚, 洪日昌, 汪萌. 手语识别、翻译与生成综述 Review of Sign Language Recognition, Translation and Generation 计算机科学, 2021, 48(3): 60-70. https://doi.org/10.11896/jsjkx.210100227 |
|