计算机科学 ›› 2022, Vol. 49 ›› Issue (7): 79-88.doi: 10.11896/jsjkx.210600028
张洪博1, 董力嘉1, 潘玉彪2, 萧宗志2, 张惠臻2, 杜吉祥2,3
ZHANG Hong-bo1, DONG Li-jia1, PAN Yu-biao2, HSIAO Tsung-chih2, ZHANG Hui-zhen2, DU Ji-xiang2,3
摘要: 视频中动作质量的评估指对视频中人物对象的动作质量进行评价,如计算动作质量分数、等级或者不同人物表现的优劣,是视频理解和计算机视觉研究中的一个重要方向。从动作质量分数预测、等级分类以及水平排序3个方面对视频中的动作质量评估方法进行总结,然后对这些方法在目前常用数据集上的表现进行分析,最后讨论未来研究中亟待解决的问题。
中图分类号:
[1]ANTUNES M,BAPTISTA R,DEMISSE G,et al.Visual andhuman-interpretable feedback for assisting physical activity[C]//European Conference on Computer Vision.Cham:Springer,2016:115-129. [2]PAIEMENT A,TAO L,HANNUNA S,et al.Online quality assessment of human movement from skeleton data[C]//British Machine Vision Conference.BMVA press,2014:153-166. [3]LI Y,CHAI X,CHEN X.End-to-end learning for action quality assessment[C]//Pacific Rim Conference on Multimedia.Cham:Springer,2018:125-134. [4]LI Y,CHAI X,CHEN X.ScoringNet:learning key fragment for action quality assessment with ranking loss in skilled sports[C]//Asian Conference on Computer Vision.Cham:Springer,2018:149-164. [5]PARMAR P,MORRIS B T.Action quality assessment acrossmultiple actions[C]//2019 IEEE Winter Conference on Applications of Computer Vision(WACV).IEEE,2019:1468-1476. [6]PARMAR P,MORRIS B T.What and how well you performed?A multitask learning approach to action quality assessment[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:304-313. [7]WANG Z,FEY A M.Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery[J].International Journal of Computer Assisted Radiology and Surgery,2018,13(12):1959-1970. [8]XIANG X,TIAN Y,REITER A,et al.S3d:Stacking segmental p3d for action quality assessment[C]//2018 25th IEEE International Conference on Image Processing(ICIP).IEEE,2018:928-932. [9]XU C,FU Y,ZHANG B,et al.Learning to score figure skating sport videos[J].IEEE Transactions on Circuits and Systems for Video Technology,2019,30(12):4578-4590. [10]PARMAR P,MORRIS B T.Learning to score olympic events[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2017:20-28. [11]ZIA A,SHARMA Y,BETTADAPURA V,et al.Automated assessment of surgical skills using frequency analysis[C]//International Conference on Medical Image Computing and Compu-ter-Assisted Intervention.Cham:Springer,2015:430-438. [12]FARD M J,AMERI S,ELLIS R D,et al.Automated robot-assisted surgical skill evaluation:Predictive analytics approach[J/OL].The International Journal of Medical Robotics and Computer Assisted Surgery,2018,14(1).https://onlinelibrary.wiley.com/doi/10.1002/rcs.1850. [13]DOUGHTY H,DAMEN D,MAYOL-CUEVAS W.Who's better?Who's best? Pairwise deep ranking for skill determination[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6057-6066. [14]DOUGHTY H,MAYOL-CUEVAS W,DAMEN D.The prosand cons:Rank-aware temporal attention for skill determination in long videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:7862-7871. [15]LI Z,HUANG Y,CAI M,et al.Manipulation-skill assessment from videos with spatial attention network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.2019. [16]LIAO Y,VAKANSKI A,XIAN M.A deep learning framework for assessing physical rehabilitation exercises[J].IEEE Transactions on Neural Systems and Rehabilitation Engineering,2020,28(2):468-477. [17]FAWAZ H I,FORESTIER G,WEBER J,et al.Evaluating surgical skills from kinematic data using convolutional neural networks[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham:Springer,2018:214-221. [18]DRUCKER H,WU D,VAPNIK V N.Support vector machines for spam categorization[J].IEEE Transactions on Neural networks,1999,10(5):1048-1054. [19]BERNDT D J,CLIFFORD J.Using dynamic time warping tofind patterns in time series[C]//Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining(AAAIWS '94).1994:359-370. [20]FREUND Y,SCHAPIRE R E.Experiments with a new boosting algorithm[C]//ICML.1996:148-156. [21]BROMLEY J,GUYON I,LECUN Y,et al.Signature verification using a“siamese” time delay neural network[J].Advances in Neural Information Processing Systems,1993,6:737-744. [22]HU Q,QIN L,HUANG Q M.A Survey of Human ActionRecognization based Vision[J].Chinese Journal of Computers,2013,36(12):2512-2524. [23]LUO H,WANG C J,LU F.Survey of video behavior recognition[J].Journal on Communications,2018,39(6):169. [24]LEI Q,DU J X,ZHANG H B,et al.A survey of vision-based human action evaluation methods[J].Sensors,2019,19(19):4129. [25]PIRSIAVASH H,VONDRICK C,TORRALBA A.Assessingthe quality of actions[C]//European Conference on Computer Vision.Cham:Springer,2014:556-571. [26]TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3d convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4489-4497. [27]HOCHREITER S,SCHMIDHUBER J.Long short-term me-mory[J].Neural Computation,1997,9(8):1735-1780. [28]DRUCKER H,BURGES C J C,KAUFMAN L,et al.Support vector regression machines[J].Advances in Neural Information Processing Systems,1997,9:155-161. [29]PERE M,KRISTAN M,PER J,et al.Automatic evaluation of organized basketball activity using bayesian networks[M].NA,2007. [30]CARVAJAL J,SANDERSON C,MCCOOL C,et al.Multi-action recognition via stochastic modelling of optical flow and gradients[C]//Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis.2014:19-24. [31]QIU Z,YAO T,MEI T.Learning spatio-temporal representation with pseudo-3d residual networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5533-5541. [32]LEA C,FLYNN M D,VIDAL R,et al.Temporal convolutional networks for action segmentation and detection[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:156-165. [33]XU C,FU Y,ZHANG B,et al.Learning to score figure skating sport videos[J].IEEE Transactions on Circuits and Systems for Video Technology,2019,30(12):4578-4590. [34]JAIN H,HARIT G,SHARMA A.Action quality assessmentusing siamese network-based deep metric learning[J].IEEE Transactions on Circuits and Systems for Video Technology,2020,31(6):2260-2273. [35]CHAI X,LIU Z,LI Y,et al.SignInstructor:an effective tool for sign language vocabulary learning[C]//2017 4th IAPR Asian Conference on Pattern Recognition(ACPR).IEEE,2017:900-905. [36]PARMAR P,REDDY J,MORRIS B.Piano Skills Assessment[J].arXiv:2101.04884,2021. [37]NEKOUI M,CRUZ F O T,CHENG L.FALCONS:FastLearner-grader for Contorted poses in Sports[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:900-901. [38]ZENG L A,HONG F T,ZHENG W S,et al.Hybrid Dynamic-static Context-aware Attention Network for Action Assessment in Long Videos[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:2526-2534. [39]NEKOUI M,CRUZ F O T,CHENG L.EAGLE-Eye:Extreme-Pose Action Grader Using Detail Bird's-Eye View[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2021:394-402. [40]TANG Y,NI Z,ZHOU J,et al.Uncertainty-aware score distribution learning for action quality assessment[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:9839-9848. [41]GAO J,ZHENG W S,PAN J H,et al.An asymmetric modeling for action assessment[C]//European Conference on Computer Vision.Cham:Springer,2020:222-238. [42]WANG J,DU Z,LI A,et al.Assessing Action Quality via Attentive Spatio-Temporal Convolutional Networks[C]//Chinese Conference on Pattern Recognition and Computer Vision(PRCV).Cham:Springer,2020:3-16. [43]MCNALLY W,VATS K,PINTO T,et al.Golfdb:A video database for golf swing sequencing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2019. [44]YADAV S K,SINGH A,GUPTA A,et al.Real-time Yoga reco-gnition using deep learning[J].Neural Computing and Applications,2019,31(12):9349-9361. [45]PARMAR P,MORRIS B T.Measuring the quality of exercises[C]//2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society(EMBC).IEEE,2016:2241-2244. [46]BAPTISTA R,ANTUNES M,AOUADA D,et al.Video-based feedback for assisting physical activity[C]//12th International Joint Conference on Computer Vision,Imaging and Computer Graphics Theory and Applications(VISAPP).2017. [47]GAO Y,VEDULA S S,REILEY C E,et al.JHU-ISI gestureand skill assessment working set(jigsaws):A surgical activity dataset for human motion modeling[C]//MICCAI Workshop:M2cai.2014. [48]TAO L,ELHAMIFAR E,KHUDANPUR S,et al.Sparse hidden markov models for surgical gesture classification and skill evaluation[C]//International Conference On Information Processing in Computer-assisted Interventions.Berlin:Springer,2012:167-177. [49]LAPTEV I.On space-time interest points[J].InternationalJournal of Computer Vision,2005,64(213):107-123. [50]AHMED N,NATARAJAN T,RAO K R.Discrete cosine transform[J].IEEE Transactions on Computers,1974,100(1):90-93. [51]WEINSTEIN S,EBERT P.Data transmission by frequency-division multiplexing using the discrete Fourier transform[J].IEEE transactions on Communication Technology,1971,19(5):628-634. [52]CARVAJAL J,WILIEM A,SANDERSON C,et al.TowardsMiss Universe automatic prediction:The evening gown competition[C]//2016 23rd International Conference on Pattern Recognition(ICPR).IEEE,2016:1089-1094. [53]PENG X,ZOU C,QIAO Y,et al.Action recognition withstacked fisher vectors[C]//European Conference on Computer Vision.Cham:Springer,2014:581-595. [54]CARREIRA J,ZISSERMAN A.Quo vadis,action recognition?Anew model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308. [55]PARMAR P,MORRIS B T.What and how well you performed?A multitask learning approach to action quality assessment[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:304-313. [56]VENKATARAMAN V,VLACHOS I,TURAGA P K.Dynamical Regularity for Action Analysis[C]//BMVC.2015:1-12. [57]FORESTIER G,PETITJEAN F,SENIN P,et al.Discoveringdiscriminative and interpretable patterns for surgical motion analysis[C]//Conference on Artificial Intelligence in Medicine in Europe.Cham:Springer,2017:136-145. [58]ZIA A,ESSA I.Automated surgical skill assessment in RMIS training[J].International Journal of Computer Assisted Radio-logy and Surgery,2018,13(5):731-739. [59]FUNKE I,MEES S T,WEITZ J,et al.Video-based surgicalskill assessment using 3D convolutional neural networks[J].International Journal of Computer Assisted Radiology and Surgery,2019,14(7):1217-1225. |
[1] | 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙. 基于自然语言的视频片段定位综述 Overview of Natural Language Video Localization 计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130 |
[2] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[3] | 郭丹, 唐申庚, 洪日昌, 汪萌. 手语识别、翻译与生成综述 Review of Sign Language Recognition, Translation and Generation 计算机科学, 2021, 48(3): 60-70. https://doi.org/10.11896/jsjkx.210100227 |
[4] | 张衡, 马明栋, 王得玉. 基于聚类网络的文本-视频特征学习 Text-Video Feature Learning Based on Clustering Network 计算机科学, 2020, 47(7): 125-129. https://doi.org/10.11896/jsjkx.190700006 |
|