Computer Science ›› 2020, Vol. 47 ›› Issue (9): 123-128.doi: 10.161896/jsjkx.190800101

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Action-related Network:Towards Modeling Complete Changeable Action

HE Xin1, XU Juan1,2, JIN Ying-ying1   

  1. 1 College of Computer Since,Technology,Nanjing University of Aeronautics,Astronautics,Nanjing 211100,China
    2 Key Laboratory of Computer Network and Information Integration,Ministry of Education,Southeast University,Nanjing 210096,China
  • Received:2019-08-22 Published:2020-09-10
  • About author:HE Xin,born in 1995,postgraduate,is a member of China Computer Federation.His main research interests include deep learning and action recognition.
    XU Juan,born in 1981,associate professor,is a member of China Computer Fe-deration.Her main interests include quantum computing and quantum information,cloud computing and deep lear-ning.

Abstract: When modeling the complete action in the video,the commonly used method is the temporal segment network (TSN),but TSN cannot fully obtain the action change information.In order to fully explore the change information of action in the time dimension,the Action-Related Network (ARN) is proposed.Firstly,the BN-Inception network is used to extract the features of the action in the video,and then the extracted video segmentation features are combined with the features output by the Long Short-Term Memory (LSTM),and finally classified.With the above approach,ARN can take into account both static and dyna-mic information about the action.Experiments show that on the general data set HMDB-51,the recognition accuracy of ARN is 73.33%,which is 7% higher than the accuracy of TSN.When the action information is increased,the recognition accuracy of ARN will be 10% higher than TSN.On the Something-Something V1 data set with more action changes,the recognition accuracy of ARN is 28.12%,which is 51% higher than the accuracy of TSN.Finally,in some action categories of HMDB-51 dataset,this paper further analyzes the changes of the recognition accuracy of ARN and TSN when using more complete action information res-pectively.The recognition accuracy of ARN is higher than TSN by 10 percentage points.It can be seen that ARN makes full use of the complete action information through the change of the associated action,thereby effectively improving the recognition accuracy of the change action.

Key words: Action recognition, Action-related network, Deep learning, Computer vision

CLC Number: 

  • TP391
[1] SOOMRO K,ZAMIR A R,SHAH M.UCF101:A dataset of101 human actions classes from videos in the wild[J].arXiv:1212.0402,2012.
[2] KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:a large video database for human motion recognition[C]//2011 International Conference on Computer Vision.IEEE,2011:2556-2563.
[3] GOYAL R,KAHOU S E,MICHALSKI V,et al.The Something Something Video Database for Learning and Evaluating Visual Common Sense[C]//ICCV.2017,1(2):3.
[4] KAY W,CARREIRA J,SIMONYAN K,et al.The kinetics human action video dataset[J].arXiv:1705.06950,2017.
[5] RUSSAKOVSKY O,DENG J,SUH,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
[6] SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos[C]//Advances in neural information processing systems.2014:568-576.
[7] DU T,BOURDEV L D,FERGUS R,et al.C3d:generic features for video analysis[J].Eprint Arxiv,2014,2(8).
[8] DONAHUE J,ANNE HENDRICKS L,GUADARRAMA S,et al.Long-term recurrent convolutional networks for visual recognition and description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:2625-2634.
[9] CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308.
[10] QIU Z,YAO T,MEI T.Learning spatio-temporal representation with pseudo-3d residual networks[C]//proceedings of the IEEE International Conference on Computer Vision.2017:5533-5541.
[11] XIE S,SUN C,HUANG J,et al.Rethinking spatiotemporal feature learning:Speed-accuracy trade-offs in video classification[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:305-321.
[12] TRAN D,WANG H,TORRESANI L,et al.A closer look at spatiotemporal convolutions for action recognition[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2018:6450-6459.
[13] CRASTO N,WEINZAEPFEL P,ALAHARI K,et al.MARS:Motion-Augmented RGB Stream for Action Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:7882-7891.
[14] SUN S,KUANG Z,SHENGL,et al.Optical flow guided feature:A fast and robust motion representation for video action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1390-1399.
[15] WANG L,XIONG Y,WANG Z,et al.Temporal segment networks:Towards good practices for deep action recognition[C]//European Conference on Computer Vision.Springer,Cham,2016:20-36.
[16] ZHOU B L,ANDONIAN A,OLIVA A,et al.Temporal relational reasoning in videos[C]//Proceedings of the EuropeanConfe-rence on Computer Vision (ECCV).2018:803-818.
[17] ZOLFAGHARI M,SINGH K,BROX T.Eco:Efficient convolutional network for online video understanding[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:695-712.
[18] MA C Y,CHEN M H,KIRA Z,et al.Ts-lstm and temporal-inception:Exploiting spatiotemporal dynamics for activity recognition[J].Signal Processing:Image Communication,2019,71:76-87.
[19] CHEN Q,ZHU X,LING Z,et al.Enhanced lstm for natural language inference[J].arXiv:1609.06038,2016.
[20] GAO L,GUO Z,ZHANG H,et al.Video captioning with attention-based LSTM and semantic consistency[J].IEEE Transactions on Multimedia,2017,19(9):2045-2055.
[21] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[22] IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[J].arXiv:1502.03167,2015.
[23] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[24] HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely con-nected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[1] WANG Rui-ping, JIA Zhen, LIU Chang, CHEN Ze-wei, LI Tian-rui. Deep Interest Factorization Machine Network Based on DeepFM [J]. Computer Science, 2021, 48(1): 226-232.
[2] YU Wen-jia, DING Shi-fei. Conditional Generative Adversarial Network Based on Self-attention Mechanism [J]. Computer Science, 2021, 48(1): 241-246.
[3] TONG Xin, WANG Bin-jun, WANG Run-zheng, PAN Xiao-qin. Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing [J]. Computer Science, 2021, 48(1): 258-267.
[4] DING Yu, WEI Hao, PAN Zhi-song, LIU Xin. Survey of Network Representation Learning [J]. Computer Science, 2020, 47(9): 52-59.
[5] YE Ya-nan, CHI Jing, YU Zhi-ping, ZHAN Yu-liand ZHANG Cai-ming. Expression Animation Synthesis Based on Improved CycleGan Model and Region Segmentation [J]. Computer Science, 2020, 47(9): 142-149.
[6] DENG Liang, XU Geng-lin, LI Meng-jie, CHEN Zhang-jin. Fast Face Recognition Based on Deep Learning and Multiple Hash Similarity Weighting [J]. Computer Science, 2020, 47(9): 163-168.
[7] BAO Yu-xuan, LU Tian-liang, DU Yan-hui. Overview of Deepfake Video Detection Technology [J]. Computer Science, 2020, 47(9): 283-292.
[8] YUAN Ye, HE Xiao-ge, ZHU Ding-kun, WANG Fu-lee, XIE Hao-ran, WANG Jun, WEI Ming-qiang, GUO Yan-wen. Survey of Visual Image Saliency Detection [J]. Computer Science, 2020, 47(7): 84-91.
[9] WANG Wen-dao, WANG Run-ze, WEI Xin-lei, QI Yun-liang, MA Yi-de. Automatic Recognition of ECG Based on Stacked Bidirectional LSTM [J]. Computer Science, 2020, 47(7): 118-124.
[10] LIU Yan, WEN Jing. Complex Scene Text Detection Based on Attention Mechanism [J]. Computer Science, 2020, 47(7): 135-140.
[11] ZHANG Zhi-yang, ZHANG Feng-li, TAN Qi, WANG Rui-jin. Review of Information Cascade Prediction Methods Based on Deep Learning [J]. Computer Science, 2020, 47(7): 141-153.
[12] JIANG Wen-bin, FU Zhi, PENG Jing, ZHU Jian. 4Bit-based Gradient Compression Method for Distributed Deep Learning System [J]. Computer Science, 2020, 47(7): 220-226.
[13] CHEN Jin-yin, ZHANG Dun-Jie, LIN Xiang, XU Xiao-dong and ZHU Zi-ling. False Message Propagation Suppression Based on Influence Maximization [J]. Computer Science, 2020, 47(6A): 17-23.
[14] CHENG Zhe, BAI Qian, ZHANG Hao, WANG Shi-pu and LIANG Yu. Improving Hi-C Data Resolution with Deep Convolutional Neural Networks [J]. Computer Science, 2020, 47(6A): 70-74.
[15] HE Lei, SHAO Zhan-peng, ZHANG Jian-hua and ZHOU Xiao-long. Review of Deep Learning-based Action Recognition Algorithms [J]. Computer Science, 2020, 47(6A): 139-147.
Full text



[1] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[2] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[3] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[4] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[5] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .
[6] YANG Yu-qi, ZHANG Guo-an and JIN Xi-long. Dual-cluster-head Routing Protocol Based on Vehicle Density in VANETs[J]. Computer Science, 2018, 45(4): 126 -130 .
[7] ZHANG Jing and ZHU Guo-bin. Hot Topic Discovery Research of Stack Overflow Programming Website Based on CBOW-LDA Topic Model[J]. Computer Science, 2018, 45(4): 208 -214 .
[8] DING Shu-yang, LI Bing and SHI Hong-bo. Study on Flexible Job-shop Scheduling Problem Based on Improved Discrete Particle Swarm Optimization Algorithm[J]. Computer Science, 2018, 45(4): 233 -239 .
[9] LI Hao-yang and FU Yun-qing. Collaborative Filtering Recommendation Algorithm Based on Tag Clustering and Item Topic[J]. Computer Science, 2018, 45(4): 247 -251 .
[10] WANG Zhen-wu, LV Xiao-hua and HAN Xiao-hui. Survey of Terrain LOD Technology Based on Quadtree Segmentation[J]. Computer Science, 2018, 45(4): 34 -45 .