计算机科学 ›› 2022, Vol. 49 ›› Issue (7): 79-88.doi: 10.11896/jsjkx.210600028

• 计算机图形学&多媒体 • 上一篇    下一篇

视频理解中的动作质量评估方法综述

张洪博1, 董力嘉1, 潘玉彪2, 萧宗志2, 张惠臻2, 杜吉祥2,3   

  1. 1 华侨大学计算机科学与技术学院 福建 厦门361000
    2 华侨大学福建省大数据智能与安全重点实验室 福建 厦门361000
    3 华侨大学厦门市计算机视觉与模式识别重点实验室 福建 厦门361000
  • 收稿日期:2021-06-02 修回日期:2021-10-20 出版日期:2022-07-15 发布日期:2022-07-12
  • 通讯作者: 张洪博( zhanghongbo@hqu.edu.cn)
  • 基金资助:
    国家自然科学基金(61871196);福建省自然科学基金(2019J01082);华侨大学优秀青年科研创新人才项目(ZQN-YX601)

Survey on Action Quality Assessment Methods in Video Understanding

ZHANG Hong-bo1, DONG Li-jia1, PAN Yu-biao2, HSIAO Tsung-chih2, ZHANG Hui-zhen2, DU Ji-xiang2,3   

  1. 1 School of Computer Science and Technology,Huqiao University,Xiamen,Fujian 361000,China
    2 Fujian Key Laboratory of Big Data Intelligence and Security,Huaqiao University,Xiamen,Fujian 361000,China
    3 Xiamen Key Laboratory of Computer Vision and Pattern Recognition,Huaqiao University,Xiamen,Fujian 361000,China
  • Received:2021-06-02 Revised:2021-10-20 Online:2022-07-15 Published:2022-07-12
  • About author:ZHANG Hong-bo,born in 1986,Ph.D,associate professor,master tutor,is a member of China Computer Federation.His main research interests include computer vision,machine learning and video understanding.
  • Supported by:
    National Natural Science Foundation of China(61871196),Natural Science Foundation of Fujian Province,China(2019J01082) and Promotion Program for Young and Middle-aged Teachers in Science and Technology Research of Huaqiao University(ZQN-YX601).

摘要: 视频中动作质量的评估指对视频中人物对象的动作质量进行评价,如计算动作质量分数、等级或者不同人物表现的优劣,是视频理解和计算机视觉研究中的一个重要方向。从动作质量分数预测、等级分类以及水平排序3个方面对视频中的动作质量评估方法进行总结,然后对这些方法在目前常用数据集上的表现进行分析,最后讨论未来研究中亟待解决的问题。

关键词: 视频理解, 行为质量评估, 质量分数预测, 等级分类, 水平排序

Abstract: Action quality assessment refers to evaluate the action quality performed by human in video,such as calculating the quality score,level and evaluating the performance of different people.It is an important direction in video understanding and computer vision research.This paper summarizes the main methods of action quality assessment,including action quality score prediction methods,level classification and ranking methods.The performance of these methods on public datasets is also analyzed.Finally,the challenge problems in future research are discussed.

Key words: Video understanding, Action quality assessment, Quality score prediction, Grade classification, Level sort

中图分类号: 

  • TP391.41
[1]ANTUNES M,BAPTISTA R,DEMISSE G,et al.Visual andhuman-interpretable feedback for assisting physical activity[C]//European Conference on Computer Vision.Cham:Springer,2016:115-129.
[2]PAIEMENT A,TAO L,HANNUNA S,et al.Online quality assessment of human movement from skeleton data[C]//British Machine Vision Conference.BMVA press,2014:153-166.
[3]LI Y,CHAI X,CHEN X.End-to-end learning for action quality assessment[C]//Pacific Rim Conference on Multimedia.Cham:Springer,2018:125-134.
[4]LI Y,CHAI X,CHEN X.ScoringNet:learning key fragment for action quality assessment with ranking loss in skilled sports[C]//Asian Conference on Computer Vision.Cham:Springer,2018:149-164.
[5]PARMAR P,MORRIS B T.Action quality assessment acrossmultiple actions[C]//2019 IEEE Winter Conference on Applications of Computer Vision(WACV).IEEE,2019:1468-1476.
[6]PARMAR P,MORRIS B T.What and how well you performed?A multitask learning approach to action quality assessment[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:304-313.
[7]WANG Z,FEY A M.Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery[J].International Journal of Computer Assisted Radiology and Surgery,2018,13(12):1959-1970.
[8]XIANG X,TIAN Y,REITER A,et al.S3d:Stacking segmental p3d for action quality assessment[C]//2018 25th IEEE International Conference on Image Processing(ICIP).IEEE,2018:928-932.
[9]XU C,FU Y,ZHANG B,et al.Learning to score figure skating sport videos[J].IEEE Transactions on Circuits and Systems for Video Technology,2019,30(12):4578-4590.
[10]PARMAR P,MORRIS B T.Learning to score olympic events[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2017:20-28.
[11]ZIA A,SHARMA Y,BETTADAPURA V,et al.Automated assessment of surgical skills using frequency analysis[C]//International Conference on Medical Image Computing and Compu-ter-Assisted Intervention.Cham:Springer,2015:430-438.
[12]FARD M J,AMERI S,ELLIS R D,et al.Automated robot-assisted surgical skill evaluation:Predictive analytics approach[J/OL].The International Journal of Medical Robotics and Computer Assisted Surgery,2018,14(1).https://onlinelibrary.wiley.com/doi/10.1002/rcs.1850.
[13]DOUGHTY H,DAMEN D,MAYOL-CUEVAS W.Who's better?Who's best? Pairwise deep ranking for skill determination[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6057-6066.
[14]DOUGHTY H,MAYOL-CUEVAS W,DAMEN D.The prosand cons:Rank-aware temporal attention for skill determination in long videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:7862-7871.
[15]LI Z,HUANG Y,CAI M,et al.Manipulation-skill assessment from videos with spatial attention network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.2019.
[16]LIAO Y,VAKANSKI A,XIAN M.A deep learning framework for assessing physical rehabilitation exercises[J].IEEE Transactions on Neural Systems and Rehabilitation Engineering,2020,28(2):468-477.
[17]FAWAZ H I,FORESTIER G,WEBER J,et al.Evaluating surgical skills from kinematic data using convolutional neural networks[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham:Springer,2018:214-221.
[18]DRUCKER H,WU D,VAPNIK V N.Support vector machines for spam categorization[J].IEEE Transactions on Neural networks,1999,10(5):1048-1054.
[19]BERNDT D J,CLIFFORD J.Using dynamic time warping tofind patterns in time series[C]//Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining(AAAIWS '94).1994:359-370.
[20]FREUND Y,SCHAPIRE R E.Experiments with a new boosting algorithm[C]//ICML.1996:148-156.
[21]BROMLEY J,GUYON I,LECUN Y,et al.Signature verification using a“siamese” time delay neural network[J].Advances in Neural Information Processing Systems,1993,6:737-744.
[22]HU Q,QIN L,HUANG Q M.A Survey of Human ActionRecognization based Vision[J].Chinese Journal of Computers,2013,36(12):2512-2524.
[23]LUO H,WANG C J,LU F.Survey of video behavior recognition[J].Journal on Communications,2018,39(6):169.
[24]LEI Q,DU J X,ZHANG H B,et al.A survey of vision-based human action evaluation methods[J].Sensors,2019,19(19):4129.
[25]PIRSIAVASH H,VONDRICK C,TORRALBA A.Assessingthe quality of actions[C]//European Conference on Computer Vision.Cham:Springer,2014:556-571.
[26]TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3d convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4489-4497.
[27]HOCHREITER S,SCHMIDHUBER J.Long short-term me-mory[J].Neural Computation,1997,9(8):1735-1780.
[28]DRUCKER H,BURGES C J C,KAUFMAN L,et al.Support vector regression machines[J].Advances in Neural Information Processing Systems,1997,9:155-161.
[29]PERŠE M,KRISTAN M,PERŠ J,et al.Automatic evaluation of organized basketball activity using bayesian networks[M].NA,2007.
[30]CARVAJAL J,SANDERSON C,MCCOOL C,et al.Multi-action recognition via stochastic modelling of optical flow and gradients[C]//Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis.2014:19-24.
[31]QIU Z,YAO T,MEI T.Learning spatio-temporal representation with pseudo-3d residual networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5533-5541.
[32]LEA C,FLYNN M D,VIDAL R,et al.Temporal convolutional networks for action segmentation and detection[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:156-165.
[33]XU C,FU Y,ZHANG B,et al.Learning to score figure skating sport videos[J].IEEE Transactions on Circuits and Systems for Video Technology,2019,30(12):4578-4590.
[34]JAIN H,HARIT G,SHARMA A.Action quality assessmentusing siamese network-based deep metric learning[J].IEEE Transactions on Circuits and Systems for Video Technology,2020,31(6):2260-2273.
[35]CHAI X,LIU Z,LI Y,et al.SignInstructor:an effective tool for sign language vocabulary learning[C]//2017 4th IAPR Asian Conference on Pattern Recognition(ACPR).IEEE,2017:900-905.
[36]PARMAR P,REDDY J,MORRIS B.Piano Skills Assessment[J].arXiv:2101.04884,2021.
[37]NEKOUI M,CRUZ F O T,CHENG L.FALCONS:FastLearner-grader for Contorted poses in Sports[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:900-901.
[38]ZENG L A,HONG F T,ZHENG W S,et al.Hybrid Dynamic-static Context-aware Attention Network for Action Assessment in Long Videos[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:2526-2534.
[39]NEKOUI M,CRUZ F O T,CHENG L.EAGLE-Eye:Extreme-Pose Action Grader Using Detail Bird's-Eye View[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2021:394-402.
[40]TANG Y,NI Z,ZHOU J,et al.Uncertainty-aware score distribution learning for action quality assessment[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:9839-9848.
[41]GAO J,ZHENG W S,PAN J H,et al.An asymmetric modeling for action assessment[C]//European Conference on Computer Vision.Cham:Springer,2020:222-238.
[42]WANG J,DU Z,LI A,et al.Assessing Action Quality via Attentive Spatio-Temporal Convolutional Networks[C]//Chinese Conference on Pattern Recognition and Computer Vision(PRCV).Cham:Springer,2020:3-16.
[43]MCNALLY W,VATS K,PINTO T,et al.Golfdb:A video database for golf swing sequencing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2019.
[44]YADAV S K,SINGH A,GUPTA A,et al.Real-time Yoga reco-gnition using deep learning[J].Neural Computing and Applications,2019,31(12):9349-9361.
[45]PARMAR P,MORRIS B T.Measuring the quality of exercises[C]//2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society(EMBC).IEEE,2016:2241-2244.
[46]BAPTISTA R,ANTUNES M,AOUADA D,et al.Video-based feedback for assisting physical activity[C]//12th International Joint Conference on Computer Vision,Imaging and Computer Graphics Theory and Applications(VISAPP).2017.
[47]GAO Y,VEDULA S S,REILEY C E,et al.JHU-ISI gestureand skill assessment working set(jigsaws):A surgical activity dataset for human motion modeling[C]//MICCAI Workshop:M2cai.2014.
[48]TAO L,ELHAMIFAR E,KHUDANPUR S,et al.Sparse hidden markov models for surgical gesture classification and skill evaluation[C]//International Conference On Information Processing in Computer-assisted Interventions.Berlin:Springer,2012:167-177.
[49]LAPTEV I.On space-time interest points[J].InternationalJournal of Computer Vision,2005,64(213):107-123.
[50]AHMED N,NATARAJAN T,RAO K R.Discrete cosine transform[J].IEEE Transactions on Computers,1974,100(1):90-93.
[51]WEINSTEIN S,EBERT P.Data transmission by frequency-division multiplexing using the discrete Fourier transform[J].IEEE transactions on Communication Technology,1971,19(5):628-634.
[52]CARVAJAL J,WILIEM A,SANDERSON C,et al.TowardsMiss Universe automatic prediction:The evening gown competition[C]//2016 23rd International Conference on Pattern Recognition(ICPR).IEEE,2016:1089-1094.
[53]PENG X,ZOU C,QIAO Y,et al.Action recognition withstacked fisher vectors[C]//European Conference on Computer Vision.Cham:Springer,2014:581-595.
[54]CARREIRA J,ZISSERMAN A.Quo vadis,action recognition?Anew model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308.
[55]PARMAR P,MORRIS B T.What and how well you performed?A multitask learning approach to action quality assessment[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:304-313.
[56]VENKATARAMAN V,VLACHOS I,TURAGA P K.Dynamical Regularity for Action Analysis[C]//BMVC.2015:1-12.
[57]FORESTIER G,PETITJEAN F,SENIN P,et al.Discoveringdiscriminative and interpretable patterns for surgical motion analysis[C]//Conference on Artificial Intelligence in Medicine in Europe.Cham:Springer,2017:136-145.
[58]ZIA A,ESSA I.Automated surgical skill assessment in RMIS training[J].International Journal of Computer Assisted Radio-logy and Surgery,2018,13(5):731-739.
[59]FUNKE I,MEES S T,WEITZ J,et al.Video-based surgicalskill assessment using 3D convolutional neural networks[J].International Journal of Computer Assisted Radiology and Surgery,2019,14(7):1217-1225.
[1] 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙.
基于自然语言的视频片段定位综述
Overview of Natural Language Video Localization
计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130
[2] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[3] 郭丹, 唐申庚, 洪日昌, 汪萌.
手语识别、翻译与生成综述
Review of Sign Language Recognition, Translation and Generation
计算机科学, 2021, 48(3): 60-70. https://doi.org/10.11896/jsjkx.210100227
[4] 张衡, 马明栋, 王得玉.
基于聚类网络的文本-视频特征学习
Text-Video Feature Learning Based on Clustering Network
计算机科学, 2020, 47(7): 125-129. https://doi.org/10.11896/jsjkx.190700006
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!