Computer Science ›› 2022, Vol. 49 ›› Issue (11A): 211200025-10.doi: 10.11896/jsjkx.211200025
• Image Processing & Multimedia Technology • Previous Articles Next Articles
QIAN Wen-xiang1,3, YI Yang1,2,3
CLC Number:
[1]DALAL N,TRIGGS B.Histograms of oriented gradients forhuman detection[C]//Institute of Electrical and Electronics Engineering Conference on Computer Vision and Pattern Recognition.San Diego,USA,2005:886-893. [2]CHAUDHRY R,RAVICHANDRAN A,HAGER G,et al.Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions[C]//Institute of Electrical and Electronics Engineering Conference on Computer Vision and Pattern Recognition.Miami,USA,2009:1932-1939. [3]WANG H,KLASER A,SCHMID C,et al.Dense Trajectories and Motion Boundary Descriptors for Action Recognition[J].International Journal of Computer Vision,2013,103(1):61-79. [4]LAZEBNIK S,SCHMID C,PONCE J.Beyond Bags of Fea-tures:Spatial Pyramid Matching for Recognizing Natural Scene Categories[C]//Electrical and Electronics Engineering ComputerSociety Conference on Computer Vision and Pattern Recognition.New York,USA,2006:2169-2178. [5]YANG M,ZHANG L,FENG X,et al.Sparse representation based fisher discrimination dictionary learning for image classification[J].International Journal of Computer Vision,2014,109(3):209-232. [6]HINTON G E.Learning multiple layersof representation[J].Trends in Cognitive Sciences,2007,11(10):428-434. [7]DENG L,YU D.Deep learning:methods and applications[J].Foundations and Trendsr in Signal Processing,2014,7(3/4):197-387. [8]SCHMIDHUBER J.Deep learning in neural networks:an overview[J].Neural Networks,2015,61(1):85-117. [9]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems,2012,25(1):1097-1105. [10]KARPATHY A,TODERICI G,SHETTY S,et al.Large-Scale Video Classification with Convolutional Neural Networks[C]//Institute of Electrical and Electronics Engineering Conference on Computer Vision and Pattern Recognition.Columbus,USA,2014:1725-1732. [11]MATERZYNSKA J,XIAO T,HERZIG R,et al.Something-Else:Compositional Action Recognition With Spatial-Temporal Interaction Networks[C]//Institute of Electrical and Electroni-cs Engineering/Computer Vision Foundation Conference on Computer Vision and Pattern Recognition.Virtual,2020:1049-1059. [12]SOOMRO K,ZAMIR A R,SHAH M.UCF101:A Dataset of101 Human Actions Classes From Videos in The Wild[J].ar-Xiv:1212.0402,2012. [13]KAY W,CARREIRA J,SIMONYAN K,et al.The KineticsHuman Action Video Dataset[J].arXiv:1705.06950,2017. [14]GU C,CHEN S,DAVID A R,et al.Ava:A video dataset of spatio-temporally localized atomic visual actions[C]//Institute of Electrical and Electronics Engineering Conference on Computer Vision and Pattern Recognition.Salt Lake City,USA,2018:6047-6056. [15]LI A,THOTAKURI M,ROSS D A,et al.The AVA-Kinetics Localized Human Actions Video Dataset[J].arXiv:2005.00214,2020. [16]KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:ALarge Video Database for Human Motion Recognition[C]//Institute of Electrical and Electronics Engineering International Conference on Computer Vision.Barcelona,Spain,2011:2556-2563. [17]SIGURDSSON G A,VAROL G,WANG X,et al.Hollywood in homes:Crowdsourcing data collection for activity understanding[C]//European Conference on Computer Vision.Cham:Swit-zerland,2016:510-526. [18]GUNNAR A S,ABHINAV G,CORDELIA S,et al.Charades-ego:A large-scale dataset of paired third and first person videos[J].arXiv:1804.09626,2018. [19]DAMEN D,DOUGHTY H,FARINELLA G M,et al.Rescaling Egocentric Vision:Collection,Pipeline and Challenges for EPIC-KITCHENS-100[J].International Journal of Computer Vision,2022,130(1):33-55. [20]DAMEN D,DOUGHTY H,FARINELLA G M,et al.Scalingegocentric vision:The epic-kitchens dataset[C]//The European Conference on Computer Vision.Munich,Germany,2018:720-736. [21]DAMEN D,DOUGHTY H,FARINELLA G M,et al.The epic-kitchens dataset:Collection,challenges and baselines[J].Institute of Electrical and Electronics Engineering Transactions on Pattern Analysis & Machine Intelligence,2020(1):1-1. [22]ABU-EL-HAIJA S,KOTHARI N,LEE J,et al.Youtube-8m:A large-scale video classification benchmark[J].arXiv:1609.08675,2016. [23]ANTIPOV G,BERRANI S A,RUCHAUD N,et al.Learnedvs.hand-crafted features for pedestrian gender recognition[C]//23rd Association for Computing Machinery International Conference on Multimedia.New York,USA,2015:1263-1266. [24]KLASER A,MARSZALEK M,SCHMID C.A spatio-temporal descriptor based on 3D-gradients[C]//19th British Machine Vision Conference.Leeds,British,2008:1-10. [25]WANG H,KLASER A,SCHMID C,et al.Action recognition by dense trajectories[C]//2011 Institute of Electrical and Electro-nics Engineering Conference on Computer Vision and Pattern Recognition.Colorado Springs,USA,2011:316903176. [26]WANG H,SCHMID C.Action recognition with improved tra-jectories[C]//2013 Institute of Electrical and Electronics Engineering International Conference on Computer Vision.Sydney,Australia,2013:3551-3558. [27]TRAN D,BOURDEV L,Fergus R,et al.Learning Spatiotemporal Features with 3D Convolutional Networks[C]//Institute of Electrical and Electronics Engineering International Conference on Computer Vision.Santiago,Chile,2015:4489-4497. [28]HUANG K,DELANY S J,MCKEEVER S.Human Action Reco-gnition in Videos Using Transfer Learning[C]//Irish Machine Vision and Image Processing Conference.Dublin,Ireland,2019. [29]ZHANG Z,SEJDIC E.Radiological images and machine lear-ning:trends,perspectives,and prospects[J].Computersin biology and medicine,2019,108(1):354-370. [30]HINTON G E.Deep belief networks[J].Scholarpedia,2009,4(5):5947. [31]TAYLOR G W,HINTON G E.Factored conditional restrictedBoltzmann machines for modeling motion style[C]//The 26th Annual International Conference on Machine Learning.New York,USA,2009:1025-1032. [32]LAROCHELLE H,BENGIO Y.Classification using discriminative restricted Boltzmann machines[C]//The 25th International Conference on Machine Learning.New York,USA,2008:536-543. [33]CHEN B.Deep learning of invariant spatio-temporal featuresfrom video[D].British Columbia:University of British Columbia,2010. [34]YANG T A,SILVER D L.The Disadvantage of CNN versusDBN Image Classification Under Adversarial Conditions[C] //The 34th Canadian Conference on Artificial Intelligence.Vancouver,Canada,2021. [35]CHEN M,RADFORD A,CHILD R,et al.Generative pretrai-ning from pixels[C]//International Conference on Machine Learning.Virtual,2020:1691-1703. [36]SOCHER R,HUVAL B,BATH B,et al.Convolutional-recur-sive deep learning for 3d object classification[J].Advances in Neural Information Processing Systems,2012,25(1):656-664. [37]VIJAYANARASIMHAN S,SHLENS J,MONGA R,et al.Deep networks with large output spaces[J].arXiv:1412.7479,2014. [38]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [39]NG Y H,HAUSKNECHT M,VIJAYANARASIMHAN S,et al.Beyond short snippets:Deep networks for video classification[C]//Institute of Electrical and Electronics Engineering Conference on Computer Vision and Pattern Recognition.Boston,USA,2015:4694-4702. [40]DONAHUE J,HENDRICKS L A,ROHRBACH M,et al.Long-term recurrent convolutional networks for visual recognition and description[J].Institute of Electrical and Electronics Enginee-ring Transactions on Pattern Analysis & Machine Intelligence,2017,39(4):677-691. [41]CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//Institute of Electrical and Electronics Engineering Conference on Computer Vision and Pattern Recognition.Honolulu,USA,2017:6299-6308. [42]SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos[J].Advances in Neural Information Processing Systems,2014,27:568-576. [43]FEICHTENHOFER C,PINZ A,ZISSERMAN A.Convolutional two-stream network fusion for video action recognition[C]//Institute of Electrical and Electronics Engineering conference on Computer Vision and Pattern Recognition.Las Vegas,USA,2016:1933-1941. [44]SIMONYAN K,ZISSERMAN A.Very deepconvolutional net-works for large-scale image recognition[J].arXiv:1409.1556,2014. [45]LIN J,GAN C,HAN S.Tsm:Temporal shift module for efficient video understanding[C]//Institute of Electrical and Electronics Engineering/Computer Vision Foundation International Conference on Computer Vision.Seoul,Korea,2019:7083-7093. [46]JI S,XU W,YANG M,et al.3D Convolutional Neural Networks for Human Action Recognition[J].Institute of Electrical and Electronics Engineering Transactions on Pattern Analysis & Machine Intelligence,2013,35(1):221-231. [47]FEICHTENHOFER C,FAN H,MALIK J,et al.Slowfast networks for video recognition[C]//Institute of Electrical and Electronics Engineering International Conference on Computer Vision.Seoul,South Korea,2019:6202-6211. [48]FEICHTENHOFER C.X3D:Expanding Architectures for Efficient Video Recognition[C]//Institute of Electrical and Electronics Engineering International Conference on Computer Vision.Virtual,2020:203-213. [49]LEE Y,KIM H I,YUN K,et al.Diverse temporal aggregation and depthwise spatiotemporal factorization for efficient video classification[J].arXiv:2012.00317,2020. [50]DU T,WANG H,TORRESANI L,et al.A Closer Look at Spatiotemporal Convolutions for Action Recognition[C]//Institute of Electrical and Electronics Engineering/Computer Vision Foundation Conference on Computer Vision and Pattern Recognition.Salt Lake City,USA,2018. [51]QIU Z,YAO T,MEI T.Learning spatio-temporal representa-tion with pseudo-3D residual networks[C]//Institute of Electrical and Electronics Engineering International Conference on Computer Vision.Venice,Italy,2017:5534-5542. [52]WANG L,XIONG Y,WANG Z,et al.Temporal segment networks:Towards good practices for deep action recognition[C]//European Conference on Computer Vision.Amsterdam,Netherlands,2016:20-36. [53]LIU Z,LUO D,WANG Y,et al.TEINet:Towards an Efficient Architecture for Video Recognition[C]//Association for the Advance of Artificial Intelligence Conference on Artificial Intelligence.New York,USA,2020:11669-11676. [54]LI Y,JI B,SHI X,et al.TEA:Temporal Excitation and Aggregation for Action Recognition[C]//Institute of Electrical and Electronics Engineering/Computer Vision Foundation Confe-rence on Computer Vision and Pattern Recognition.Virtual,2020:909-918. [55]LIU Z,WANG L,WU W,et al.TAM:Temporal adaptive mo-dule for video recognition[C]//Institute of Electrical and Electronics Engineering/Computer Vision Foundation International Conference on Computer Vision.Virtual,2021:13708-13718. [56]WANG L,TONG Z,JI B,et al.TDN:Temporal Difference Networks for Efficient Action Recognition[C]//Institute of Electrical and Electronics Engineering/Computer Vision Foundation Conference on Computer Vision and Pattern Recognition.Vir-tual,2021:1895-1904. [57]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]//Conference of the North American Chapter of the Asso-ciation for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).Minneapolis,USA,2018:4171-4186. [58]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[J].arXiv:1706.03762,2017. [59]RUAN L,QIN J.Survey:Transformer Based Video-LanguagePre-Training[J].arXiv:2109.09920,2021. [60]GIRDHAR R,CARREIRA J,DOERSCH C,et al.Video action transformer network[C]//Institute of Electrical and Electronics Engineering/Computer Vision Foundation Conference on Computer Vision and Pattern Recognition.Long Beach,USA,2019:244-253. [61]HARA K,KATAOKA H,SATOH Y.Can spatiotemporal 3dcnns retrace the history of 2d cnns and imagenet? [C]//Institute of Electrical and Electronics Engineering/Computer Vision Foundation Conference on Computer Vision and Pattern Recognition.Salt Lake City,USA,2018:6546-6555. [62]PARK J,JEON S,KIM S,et al.Learning to detect,associate,and recognize human actions and surrounding scenes in untrimmed videos[C]//The 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild.Seoul,Korea,2018:21-26. [63]SEONG H,HYUN J,KIM E.Video multitask transformer network[C]//Institute of Electrical and Electronics Engineering/Computer Vision Foundation International Conference on Computer Vision Workshops.Seoul,Korea,2019. [64]BERTASIUS G,WANG H,TORRESANI L.Is Space-Time Attention All You Need for Video Understanding[J].arXiv:2102.05095,2021. [65]ARNAB A,DEHGHANI M,HEIGOLD G,et al.ViViT:A Vi-deo Vision Transformer[C]//Institute of Electrical and Electro-nics Engineering/Computer Vision Foundation International Conference on Computer Vision.Virtual,2021:6836-6846. [66]LIU Z,LIN Y,CAO Y,et al.Swin Transformer:Hierarchical Vision Transformer Using Shifted Windows[C]//Institute of Electrical and Electronics Engineering/Computer Vision Foundation International Conference on Computer Vision.Virtual,2021:10012-10022. [67]KONDRATYUK D,YUAN L,LI Y,et al.Movinets:Mobilevideo networks for efficient video recognition[C]//Institute of Electrical and Electronics Engineering/Computer Vision Foundation Conference on Computer Vision and Pattern Recognition.Virtual,2021:16020-16030. [68]KOOT R,HENNERBICHLER M,LU H.Evaluating Trans-formers for Lightweight Action Recognition[J].arXiv:2111.09641,2021. [69]LANGERMAN D,JOHNSONA,BUETTNER K,et al.Beyond Floating-Point Ops:CNN Performance Prediction with Critical Datapath Length[C]//Institute of Electrical and Electronics Engineering High Performance Extreme Computing Conference.Virtual,2020:1-9. |
[1] | RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207. |
[2] | TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305. |
[3] | ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161. |
[4] | XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171. |
[5] | WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293. |
[6] | HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329. |
[7] | JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335. |
[8] | CHEN Yong-quan, JIANG Ying. Analysis Method of APP User Behavior Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(8): 78-85. |
[9] | ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119. |
[10] | SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177. |
[11] | HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78. |
[12] | DAI Zhao-xia, LI Jin-xin, ZHANG Xiang-dong, XU Xu, MEI Lin, ZHANG Liang. Super-resolution Reconstruction of MRI Based on DNGAN [J]. Computer Science, 2022, 49(7): 113-119. |
[13] | CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126. |
[14] | LIU Yue-hong, NIU Shao-hua, SHEN Xian-hao. Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(7): 127-131. |
[15] | XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141. |
|