视频理解中的动作质量评估方法综述

doi:10.11896/jsjkx.210600028

Abstract

Abstract: Action quality assessment refers to evaluate the action quality performed by human in video,such as calculating the quality score,level and evaluating the performance of different people.It is an important direction in video understanding and computer vision research.This paper summarizes the main methods of action quality assessment,including action quality score prediction methods,level classification and ranking methods.The performance of these methods on public datasets is also analyzed.Finally,the challenge problems in future research are discussed.

Key words: Video understanding, Action quality assessment, Quality score prediction, Grade classification, Level sort

CLC Number:

TP391.41

ZHANG Hong-bo, DONG Li-jia, PAN Yu-biao, HSIAO Tsung-chih, ZHANG Hui-zhen, DU Ji-xiang. Survey on Action Quality Assessment Methods in Video Understanding[J].Computer Science, 2022, 49(7): 79-88.

References

[1]ANTUNES M,BAPTISTA R,DEMISSE G,et al.Visual andhuman-interpretable feedback for assisting physical activity[C]//European Conference on Computer Vision.Cham:Springer,2016:115-129.
[2]PAIEMENT A,TAO L,HANNUNA S,et al.Online quality assessment of human movement from skeleton data[C]//British Machine Vision Conference.BMVA press,2014:153-166.
[3]LI Y,CHAI X,CHEN X.End-to-end learning for action quality assessment[C]//Pacific Rim Conference on Multimedia.Cham:Springer,2018:125-134.
[4]LI Y,CHAI X,CHEN X.ScoringNet:learning key fragment for action quality assessment with ranking loss in skilled sports[C]//Asian Conference on Computer Vision.Cham:Springer,2018:149-164.
[5]PARMAR P,MORRIS B T.Action quality assessment acrossmultiple actions[C]//2019 IEEE Winter Conference on Applications of Computer Vision(WACV).IEEE,2019:1468-1476.
[6]PARMAR P,MORRIS B T.What and how well you performed?A multitask learning approach to action quality assessment[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:304-313.
[7]WANG Z,FEY A M.Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery[J].International Journal of Computer Assisted Radiology and Surgery,2018,13(12):1959-1970.
[8]XIANG X,TIAN Y,REITER A,et al.S3d:Stacking segmental p3d for action quality assessment[C]//2018 25th IEEE International Conference on Image Processing(ICIP).IEEE,2018:928-932.
[9]XU C,FU Y,ZHANG B,et al.Learning to score figure skating sport videos[J].IEEE Transactions on Circuits and Systems for Video Technology,2019,30(12):4578-4590.
[10]PARMAR P,MORRIS B T.Learning to score olympic events[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2017:20-28.
[11]ZIA A,SHARMA Y,BETTADAPURA V,et al.Automated assessment of surgical skills using frequency analysis[C]//International Conference on Medical Image Computing and Compu-ter-Assisted Intervention.Cham:Springer,2015:430-438.
[12]FARD M J,AMERI S,ELLIS R D,et al.Automated robot-assisted surgical skill evaluation:Predictive analytics approach[J/OL].The International Journal of Medical Robotics and Computer Assisted Surgery,2018,14(1).https://onlinelibrary.wiley.com/doi/10.1002/rcs.1850.
[13]DOUGHTY H,DAMEN D,MAYOL-CUEVAS W.Who's better?Who's best? Pairwise deep ranking for skill determination[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6057-6066.
[14]DOUGHTY H,MAYOL-CUEVAS W,DAMEN D.The prosand cons:Rank-aware temporal attention for skill determination in long videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:7862-7871.
[15]LI Z,HUANG Y,CAI M,et al.Manipulation-skill assessment from videos with spatial attention network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.2019.
[16]LIAO Y,VAKANSKI A,XIAN M.A deep learning framework for assessing physical rehabilitation exercises[J].IEEE Transactions on Neural Systems and Rehabilitation Engineering,2020,28(2):468-477.
[17]FAWAZ H I,FORESTIER G,WEBER J,et al.Evaluating surgical skills from kinematic data using convolutional neural networks[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham:Springer,2018:214-221.
[18]DRUCKER H,WU D,VAPNIK V N.Support vector machines for spam categorization[J].IEEE Transactions on Neural networks,1999,10(5):1048-1054.
[19]BERNDT D J,CLIFFORD J.Using dynamic time warping tofind patterns in time series[C]//Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining(AAAIWS '94).1994:359-370.
[20]FREUND Y,SCHAPIRE R E.Experiments with a new boosting algorithm[C]//ICML.1996:148-156.
[21]BROMLEY J,GUYON I,LECUN Y,et al.Signature verification using a“siamese” time delay neural network[J].Advances in Neural Information Processing Systems,1993,6:737-744.
[22]HU Q,QIN L,HUANG Q M.A Survey of Human ActionRecognization based Vision[J].Chinese Journal of Computers,2013,36(12):2512-2524.
[23]LUO H,WANG C J,LU F.Survey of video behavior recognition[J].Journal on Communications,2018,39(6):169.
[24]LEI Q,DU J X,ZHANG H B,et al.A survey of vision-based human action evaluation methods[J].Sensors,2019,19(19):4129.
[25]PIRSIAVASH H,VONDRICK C,TORRALBA A.Assessingthe quality of actions[C]//European Conference on Computer Vision.Cham:Springer,2014:556-571.
[26]TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3d convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4489-4497.
[27]HOCHREITER S,SCHMIDHUBER J.Long short-term me-mory[J].Neural Computation,1997,9(8):1735-1780.
[28]DRUCKER H,BURGES C J C,KAUFMAN L,et al.Support vector regression machines[J].Advances in Neural Information Processing Systems,1997,9:155-161.
[29]PERŠE M,KRISTAN M,PERŠ J,et al.Automatic evaluation of organized basketball activity using bayesian networks[M].NA,2007.
[30]CARVAJAL J,SANDERSON C,MCCOOL C,et al.Multi-action recognition via stochastic modelling of optical flow and gradients[C]//Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis.2014:19-24.
[31]QIU Z,YAO T,MEI T.Learning spatio-temporal representation with pseudo-3d residual networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5533-5541.
[32]LEA C,FLYNN M D,VIDAL R,et al.Temporal convolutional networks for action segmentation and detection[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:156-165.
[33]XU C,FU Y,ZHANG B,et al.Learning to score figure skating sport videos[J].IEEE Transactions on Circuits and Systems for Video Technology,2019,30(12):4578-4590.
[34]JAIN H,HARIT G,SHARMA A.Action quality assessmentusing siamese network-based deep metric learning[J].IEEE Transactions on Circuits and Systems for Video Technology,2020,31(6):2260-2273.
[35]CHAI X,LIU Z,LI Y,et al.SignInstructor:an effective tool for sign language vocabulary learning[C]//2017 4th IAPR Asian Conference on Pattern Recognition(ACPR).IEEE,2017:900-905.
[36]PARMAR P,REDDY J,MORRIS B.Piano Skills Assessment[J].arXiv:2101.04884,2021.
[37]NEKOUI M,CRUZ F O T,CHENG L.FALCONS:FastLearner-grader for Contorted poses in Sports[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:900-901.
[38]ZENG L A,HONG F T,ZHENG W S,et al.Hybrid Dynamic-static Context-aware Attention Network for Action Assessment in Long Videos[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:2526-2534.
[39]NEKOUI M,CRUZ F O T,CHENG L.EAGLE-Eye:Extreme-Pose Action Grader Using Detail Bird's-Eye View[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2021:394-402.
[40]TANG Y,NI Z,ZHOU J,et al.Uncertainty-aware score distribution learning for action quality assessment[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:9839-9848.
[41]GAO J,ZHENG W S,PAN J H,et al.An asymmetric modeling for action assessment[C]//European Conference on Computer Vision.Cham:Springer,2020:222-238.
[42]WANG J,DU Z,LI A,et al.Assessing Action Quality via Attentive Spatio-Temporal Convolutional Networks[C]//Chinese Conference on Pattern Recognition and Computer Vision(PRCV).Cham:Springer,2020:3-16.
[43]MCNALLY W,VATS K,PINTO T,et al.Golfdb:A video database for golf swing sequencing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2019.
[44]YADAV S K,SINGH A,GUPTA A,et al.Real-time Yoga reco-gnition using deep learning[J].Neural Computing and Applications,2019,31(12):9349-9361.
[45]PARMAR P,MORRIS B T.Measuring the quality of exercises[C]//2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society(EMBC).IEEE,2016:2241-2244.
[46]BAPTISTA R,ANTUNES M,AOUADA D,et al.Video-based feedback for assisting physical activity[C]//12th International Joint Conference on Computer Vision,Imaging and Computer Graphics Theory and Applications(VISAPP).2017.
[47]GAO Y,VEDULA S S,REILEY C E,et al.JHU-ISI gestureand skill assessment working set(jigsaws):A surgical activity dataset for human motion modeling[C]//MICCAI Workshop:M2cai.2014.
[48]TAO L,ELHAMIFAR E,KHUDANPUR S,et al.Sparse hidden markov models for surgical gesture classification and skill evaluation[C]//International Conference On Information Processing in Computer-assisted Interventions.Berlin:Springer,2012:167-177.
[49]LAPTEV I.On space-time interest points[J].InternationalJournal of Computer Vision,2005,64(213):107-123.
[50]AHMED N,NATARAJAN T,RAO K R.Discrete cosine transform[J].IEEE Transactions on Computers,1974,100(1):90-93.
[51]WEINSTEIN S,EBERT P.Data transmission by frequency-division multiplexing using the discrete Fourier transform[J].IEEE transactions on Communication Technology,1971,19(5):628-634.
[52]CARVAJAL J,WILIEM A,SANDERSON C,et al.TowardsMiss Universe automatic prediction:The evening gown competition[C]//2016 23rd International Conference on Pattern Recognition(ICPR).IEEE,2016:1089-1094.
[53]PENG X,ZOU C,QIAO Y,et al.Action recognition withstacked fisher vectors[C]//European Conference on Computer Vision.Cham:Springer,2014:581-595.
[54]CARREIRA J,ZISSERMAN A.Quo vadis,action recognition?Anew model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308.
[55]PARMAR P,MORRIS B T.What and how well you performed?A multitask learning approach to action quality assessment[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:304-313.
[56]VENKATARAMAN V,VLACHOS I,TURAGA P K.Dynamical Regularity for Action Analysis[C]//BMVC.2015:1-12.
[57]FORESTIER G,PETITJEAN F,SENIN P,et al.Discoveringdiscriminative and interpretable patterns for surgical motion analysis[C]//Conference on Artificial Intelligence in Medicine in Europe.Cham:Springer,2017:136-145.
[58]ZIA A,ESSA I.Automated surgical skill assessment in RMIS training[J].International Journal of Computer Assisted Radio-logy and Surgery,2018,13(5):731-739.
[59]FUNKE I,MEES S T,WEITZ J,et al.Video-based surgicalskill assessment using 3D convolutional neural networks[J].International Journal of Computer Assisted Radiology and Surgery,2019,14(7):1217-1225.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Survey on Action Quality Assessment Methods in Video Understanding

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 3

Metrics

Comments

Recommended 0

[1]	ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[2]	GUO Dan, TANG Shen-geng, HONG Ri-chang, WANG Meng. Review of Sign Language Recognition, Translation and Generation [J]. Computer Science, 2021, 48(3): 60-70.
[3]	ZHANG Heng, MA Ming-dong, WANG De-yu. Text-Video Feature Learning Based on Clustering Network [J]. Computer Science, 2020, 47(7): 125-129.