基于自然语言的视频片段定位综述

doi:10.11896/jsjkx.220500130

Abstract

Abstract: Natural language video localization(NLVL),which aims to locate a target moment from a video that semantically corresponds to a text query,is a novel and challenging task.Different from the task of temporal action localization,NLVL is more flexible without restrictions from predefined action categories.Meanwhile,NLVL is more challenging since it requires align semantic information from both visual and textual modalities.Besides,how to obtain the final timestamp from the alignment relationship is also a tough task.This paper first proposes the pipeline of NLVL,and then categorizes them into supervised and weakly-supervised methods according to whether there is supervised information,following by the analysis of the strengths and weaknesses of each kind of method.Subsequently,the dataset,evaluation protocols and the general performance analysis are presented.Finally,the possible perspectives are obtained by summarizing the existing methods.

Key words: Multimodal retrieval, Video moment localization, Video comprehension, Cross-modal alignment, Cross-modal interaction

CLC Number:

TP391

NIE Xiu-shan, PAN Jia-nan, TAN Zhi-fang, LIU Xin-fang, GUO Jie, YIN Yi-long. Overview of Natural Language Video Localization[J].Computer Science, 2022, 49(9): 111-122.

References

[1]SHOU Z,WANG D G,CHANG S F.Temporal Action Localiza-tion in Untrimmed Videos via Multi-stage CNNs[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2016:1049-1058.
[2]GAO J,YANG Z,NEVATIA R.Cascaded Boundary Regression for Temporal Action Detection[C]//British Machine Vision Conference.2017:52.1-52.11.
[3]CHAO Y W,VIJAYANARASIMHAN S,SEYBOLD B,et al.Rethinking the Faster R-CNN Architecture for Temporal Action Localization[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(Cvpr).2018:1130-1139.
[4]BUCH S,ESCORCIA V,GHANEM B,et al.End-to-End,Single-Stream Temporal Action Detection in Untrimmed Videos[C]//British Machine Vision Conference.2017:93.1-93.12.
[5]LONG F C,YAO T,QIU Z F,et al.Gaussian Temporal Awareness Networks for Action Localization[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(Cvpr 2019).2019:344-353.
[6]LI S,TAO Z Q,LI K,et al.Visual to Text:Survey of Image and Video Captioning[J].IEEE Transactions on Emerging Topics in Computational Intelligence,2019,3(4):297-312.
[7]WANG B R,MA L,ZHANG W,et al.Reconstruction Network for Video Captioning[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(Cvpr).2018:7622-7631.
[8]PAN Y W,YAO T,LI H Q,et al.Video Captioning with Transferred Semantic Attributes[C]//30th IEEE Conference on Computer Vision and Pattern Recognition(Cvpr 2017).2017:984-992.
[9]GAO L L,GUO Z,ZHANG H W,et al.Video Captioning With Attention-Based LSTM and Semantic Consistency[J].IEEE Transactions on Multimedia,2017,19(9):2045-2055.
[10]WANG X,CHEN W H,WU J W,et al.Video Captioning via Hierarchical Reinforcement Learning[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(Cvpr).2018:4213-4222.
[11]SHEN Z Q,LI J G,SU Z,et al.Weakly Supervised Dense Video Captioning[C]//30th IEEE Conference on Computer Vision and Pattern Recognition(Cvpr 2017).2017:5159-5167.
[12]GAO L L,LI X P,SONG J K,et al.Hierarchical LSTMs with Adaptive Attention for Visual Captioning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(5):1112-1131.
[13]BUCH S,ESCORCIA V,SHEN C Q,et al.SST:Single-Stream Temporal Action Proposals[C]//30th IEEE Conference on Computer Vision and Pattern Recognition(Cvpr 2017).2017:6373-6382.
[14]GAO J Y,YANG Z H,SUN C,et al.TURN TAP:TemporalUnit Regression Network for Temporal Action Proposals[C]//2017 IEEE International Conference on Computer Vision(ICCV).2017:3648-3656.
[15]ESCORCIA V,HEILBRON F C,NIEBLES J C,et al.DAPs:Deep Action Proposals for Action Understanding[J].Computer Vision-ECCV 2016,9907:768-784.
[16]GAO L L,LI T,SONG J K,et al.Play and rewind:Context-aware video temporal action proposals[J].Pattern Recognition,2020,107:107477.
[17]TRAN D,BOURDEV L,FERGUS R,et al.Learning Spatiotemporal Features with 3D Convolutional Networks[C]//Procee-dings of the IEEE International Conference on Computer Vision.2015:4489-4497.
[18]CARREIRA J,ZISSERMAN A.Quo Vadis,Action Recogni-tion? A New Model and the Kinetics Dataset[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:6299-6308.
[19]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition [J].arXiv:1409.1556,2014.
[20]GIRSHICK R.Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision(ICCV).2015:1440-1448.
[21]KIROS R,ZHU Y K,SALAKHUTDINOV R,et al.Skip-Thought Vectors[J].Advances in Neural Information Proces-sing Systems 28(Nips 2015),2015,28:3294-3302.
[22]BIRD S,KLEIN E,LOPER E.Natural Language Processingwith Python:Analyzing Text with the Natural Language Toolkit[M].O'Reilly Media,Inc,2009.
[23]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2014:1532-1543.
[24]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[25]CHUNG J,GULCEHRE C,CHO K,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[26]ZHANG Z,LIN Z J,ZHAO Z,et al.Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos[C]//Proceedings of the 42nd International Acm Sigir Conference on Research and Development in Information Retrieval(Sigir'19).2019:655-664.
[27]HENDRICKS L A,WANG O,SHECHTMAN E,et al.Localizing Moments in Video with Natural Language[C]//2017 IEEE International Conference on Computer Vision(ICCV).2017:5804-5813.
[28]HENDRICKS L A,WANG O,SHECHTMAN E,et al.Localizing moments in video with temporal language[J].arXiv:1809.01337,2018.
[29]GAO J Y,SUN C,YANG Z H,et al.TALL:Temporal Activity Localization via Language Query[C]//2017 IEEE International Conference on Computer Vision(ICCV).2017:5277-5285.
[30]LIU M,WANG X,NIE L Q,et al.Attentive Moment Retrieval in Videos[C]//ACM/SIGIR Proceedings 2018.2018:15-24.
[31]LIU M,WANG X,NIE L Q,et al.Cross-modal Moment Localization in Videos[C]//Proceedings of the 2018 ACM Multimedia Conference(Mm'18).2018:843-851.
[32]WU A,HAN Y.Multi-modal Circulant Fusion for Video-to-Language and Backward[C]//IJCAI.2018:1029-1035.
[33]LIU B B,YEUNG S,CHOU E,et al.Temporal Modular Networks for Retrieving Complex Compositional Activities in Vi-deos[C]//Computer Vision-ECCV 2018.11207:569-586.
[34]ZHANG S Y,SU J S,LUO J B.Exploiting Temporal Relationships in Video Moment Localization with Natural Language[C]//Proceedings of the 27th ACM International Conference on Multimedia(Mm'19).2019:1230-1238.
[35]GE R Z,GAO J Y,CHEN K,et al.MAC:Mining Activity Concepts for Language-based Temporal Localization[J].2019 IEEE Winter Conference on Applications of Computer Vision(WACV).2019:245-253.
[36]CHEN S,JIANG Y G.Semantic proposal for activity localization in videos via sentence query[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8199-8206.
[37]JIANG B,HUANG X,YANG C,et al.Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal Attention[C]//Icmr'19:Proceedings of the 2019 ACM International Conference on Multimedia Retrieval.2019:217-225.
[38]SHAO D,XIONG Y,ZHAO Y,et al.Find and Focus:Retrieve and Localize Video Events with Natural Language Queries[J].Computer Vision-ECCV 2018,2018,11213:202-218.
[39]XU H,HE K,PLUMMER B A,et al.Multilevel language and vision integration for text-to-clip retrieval[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:9062-9069.
[40]NING K,XIE L X,LIU J Z,et al.Interaction-Integrated Net-work for Natural Language Moment Localization[J].IEEE Transactions on Image Processing,2021,30:2538-2548.
[41]CHEN J,CHEN X,MA L,et al.Temporally grounding natural sentence in video[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:162-171.
[42]LIN Z J,ZHAO Z,ZHANG Z,et al.Moment Retrieval viaCross-Modal Interaction Networks With Query Reconstruction[J].IEEE Transactions on Image Processing,2020,29:3750-3762.
[43]LU C,CHEN L,TAN C,et al.Debug:A dense bottom-upgrounding approach for natural language video localization[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJC-NLP).2019:5144-5153.
[44]YU A W,DOHAN D,LUONG M T,et al.QANet:Combining Local Convolution with Global Self-Attention for Reading Comprehension[J].arXiv:1804.09541,2018.
[45]CHEN L,LU C,TANG S,et al.Rethinking the bottom-upframework for query-based video localization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:10551-10558.
[46]ZENG R,XU H,HUANG W,et al.Dense regression network for video grounding[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:10287-10296.
[47]LIU D,QU X,LIU X Y,et al.Jointly cross-and self-modalgraph attention network for query-based moment localization[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:4070-4078.
[48]QU X,TANG P,ZOU Z,et al.Fine-grained iterative attention network for temporal language localization in videos[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:4280-4288.
[49]HUANG L,WANG W M,CHEN J,et al.Attention on Attention for Image Captioning[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV 2019).2019:4633-4642.
[50]ZHANG D,DAI X,WANG X,et al.Man:Moment alignmentnetwork for natural language moment retrieval via iterative graph adjustment[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2019:1247-1257.
[51]YUAN Y,MA L,WANG J,et al.Semantic conditioned dynamic modulation for temporal sentence grounding in videos[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,44(5):2725-2741.
[52]DAI X Y,SINGH B,NG J Y H,et al.TAN:Temporal Aggregation Network for Dense Multi-label Action Recognition[C]//2019 IEEE Winter Conference on Applications of Computer Vision(WACV).2019:151-160.
[53]ZHANG S,PENG H,FU J,et al.Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12870-12877.
[54]WANG H,ZHA Z J,CHEN X,et al.Dual path interaction network for video moment localization[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:4116-4124.
[55]YUAN Y,MEI T,ZHU W.To find where you talk:Temporal sentence localization in video with attention based location regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:9159-9166.
[56]MUN J,CHO M,HAN B.Local-global video-text interactions for temporal grounding[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10810-10819.
[57]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805,2018.
[58]KIM J H,ON K W,LIM W,et al.Hadamard Product for Low-rank Bilinear Pooling[J].arXiv:1610.04325,2016.
[59]WANG X,GIRSHICK R,GUPTA A,et al.Non-local neuralnetworks[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2018:7794-7803.
[60]CHEN S,JIANG W,LIU W,et al.Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos[C]//Computer Vision-ECCV.2020:333-351.
[61]CHEN S,JIANG Y G.Hierarchical visual-textual graph fortemporal activity localization via language[C]//European Conference on Computer Vision.2020:601-618.
[62]ZHENG Y T,PAL D K,SAVVIDES M.Ring loss:Convex Feature Normalization for Face Recognition[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2018:5089-5097.
[63]HAN J W,YANG L,ZHANG D W,et al.Reinforcement Cutting-Agent Learning for Video Object Segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2018:9080-9089.
[64]TORRADO R R,BONTRAGER P,TOGELIUS J,et al.Deep reinforcement learning for general video game ai[C]//2018 IEEE Conference on Computational Intelligence and Games(CIG).2018:1-8.
[65]RAO Y M,LU J W,ZHOU J.Attention-aware Deep Reinforcement Learning for Video Face Recognition[C]//2017 IEEE International Conference on Computer Vision(ICCV).2017:3951-3960.
[66]HE D L,ZHAO X,HUANG J Z,et al.Read,Watch,and Move:Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos[C]//Thirty-Third AAAI Conference on Artificial Intelligence/Thirty-First Innovative Applications of Artificial Intelligence Conference/Ninth AAAI Symposium on Educational Advances in Artificial Intelligence.2019:8393-8400.
[67]WANG W N,HUANG Y,WANG L.Language-driven Temporal Activity Localization:A Semantic Matching Reinforcement Learning Model[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR 2019).2019:334-343.
[68]WU J,LI G,LIU S,et al.Tree-structured policy based progressive reinforcement learning for temporally language grounding in video[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12386-12393.
[69]HAHN M,KADAV A,REHG J M,et al.Tripping throughtime:Efficient Localization of Activities in Videos[J].arXiv:1904.09936,2019.
[70]DHINGRA B,LIU H X,YANG Z L,et al.Gated-AttentionReaders for Text Comprehension[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(ACL 2017).2017:1832-1846.
[71]CAO D,ZENG Y,WEI X,et al.Adversarial video moment retrieval by jointly modeling ranking and localization[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:898-906.
[72]CAO D,ZENG Y,LIU M,et al.Strong:Spatio-temporal reinforcement learning for cross-modal video moment localization[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:4162-4170.
[73]CHEN J,MA L,CHEN X,et al.Localizing natural language in videos[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8175-8182.
[74]GHOSH S,AGARWAL A,PAREKH Z,et al.ExCL:Extractive Clip Localization Using Natural Language Descriptions[J].ar-Xiv:1904.02755,2019.
[75]WANG J,MA L,JIANG W.Temporally grounding languagequeries in videos by contextual boundary-aware prediction[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12168-12175.
[76]RODRIGUEZ-OPAZO C,MARRESE-TAYLOR E,SALEH FS,et al.Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention[C]//2020 IEEE Winter Conference on Applications of Computer Vision(WACV).2020:2453-2462.
[77]HERSHEY J R,OLSEN P A.Approximating the KullbackLeibler divergence between Gaussian mixture models[C]//2007 IEEE International Conference on Acoustics,Speech and Signal Processing-ICASSP'07.2007:IV-317-IV-320.
[78]RODRIGUEZ-OPAZO C,MARRESE-TAYLOR E,FERNAN-DO B,et al.DORi:Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video[C]//2021 IEEE Winter Conference on Applications of Computer Vision(WACV 2021).2021:1078-1087.
[79]ZHANG H,SUN A,JING W,et al.Span-based localizing network for natural language video localization[J].arXiv:2004.13931,2020.
[80]DUAN X,HUANG W,GAN C,et al.Weakly supervised dense event captioning in videos [C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:3063-3073.
[81]CHIDUME C.Iterative approximation of fixed points of Lipschitzian strictly pseudocontractive mappings[J].Proceedings of the American Mathematical Society,1987,99(2):283-288.
[82]MITHUN N C,PAUL S,ROY-CHOWDHURY A K.Weakly Supervised Video Moment Retrieval From Text Queries[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(Cvpr 2019).2019:11584-11593.
[83]GAO M,DAVIS L S,SOCHER R,et al.Wslln:Weakly supervised natural language localization networks[J].arXiv:1909.00239,2019.
[84]WU J,LI G,HAN X,et al.Reinforcement learning for weakly supervised temporal grounding of natural language in untrimmed videos[C]//Proceedings of the 28th ACM International Confe-rence on Multimedia.2020:1283-1291.
[85]SUTTON R,BARTO A.Reinforcement Learning:An Introduction[M].Massachusetts:MIT Press,1998.
[86]LIN Z,ZHAO Z,ZHANG Z,et al.Weakly-supervised video moment retrieval via semantic completion network[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2020:11539-11546.
[87]ZHANG Z,LIN Z,ZHAO Z,et al.Regularized two-branch proposal networks for weakly-supervised moment retrieval in videos[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:4098-4106.
[88]MA M,YOON S,KIM J,et al.VLANet:Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval[C]//Computer Vision-ECCV.2020:156-171.
[89]ZHANG Z,ZHAO Z,LIN Z,et al.Counterfactual contrastive learning for weakly-supervised vision-language grounding [J].Advances in Neural Information Processing Systems,2020,33:18123-18134.
[90]SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-cam:Visual explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE International Confe-rence on Computer Vision.2017:618-626.
[91]REGNERI M,ROHRBACH M,WETZEL D,et al.Grounding Action Descriptions in Videos[J].Transactions of the Association for Computational Linguistics,2013,1:25-36.
[92]HEILBRON F C,ESCORCIA V,GHANEM B,et al.ActivityNet:A Large-Scale Video Benchmark for Human Activity Understanding[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2015:961-970.
[93]THOMEE B,SHAMMA D A,FRIEDLAND G,et al.The New Data and New Challenges in Multimedia Research[J].Communications of the ACM,2015,59(2):64-73.
[94]ROHRBACH M,REGNERI M,ANDRILUKA M,et al.Script data for attribute-based recognition of composite activities[C]//European Conference on Computer Vision.2012:144-157.
[95]SIGURDSSON G A,VAROL G,WANG X L,et al.Hollywood in Homes:Crowdsourcing Data Collection for Activity Understanding[J].Computer Vision-ECCV 2016,2016,9905:510-526.
[96]LI G,DUAN N,FANG Y,et al.Unicoder-vl:A universal en-coder for vision and language by cross-modal pre-training[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11336-11344.
[97]ZHOU L,PALANGI H,ZHANG L,et al.Unified vision-lan-guage pre-training for image captioning and vqa[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2020:13041-13049.
[98]LU J,BATRA D,PARIKH D,et al.Vilbert:Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[C]//Proceedings of the 33nd International Conference on Neural Information Processing Systems.2019:13-23.
[99]TAN H,BANSAL M.LXMERT:Learning Cross-Modality Encoder Representations from Transformers[J].arXiv:1908.07490,2019.
[100]SU W,ZHU X,CAO Y,et al.VL-BERT:Pre-training of Gene-ric Visual-Linguistic Representations[J].arXiv:1908.08530,2019.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Overview of Natural Language Video Localization

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0

[1]	ZHANG Da-lin, ZHANG Zhe-wei, WANG Nan, LIU Ji-qiang. AutoUnit:Automatic Test Generation Based on Active Learning and Prediction Guidance [J]. Computer Science, 2022, 49(11): 39-48.
[2]	LI Zi-dong, YAO Yi-fei, WANG Wei-wei, ZHAO Rui-lian. Web Application Page Element Recognition and Visual Script Generation Based on Machine Vision [J]. Computer Science, 2022, 49(11): 65-75.
[3]	YAN Zhen-chao, SHU Wen-hao, XIE Xin. Incremental Feature Selection Algorithm for Dynamic Partially Labeled Hybrid Data [J]. Computer Science, 2022, 49(11): 98-108.
[4]	FU Kun, GUO Yun-peng, ZHUO Jia-ming, LI Jia-ning, LIU Qi. Semantic Information Enhanced Network Embedding with Completely Imbalanced Labels [J]. Computer Science, 2022, 49(11): 109-116.
[5]	QIAO Jing-jing, WANG Li. Modeling User Micro-Behavior via Adaptive Multi-Attention Network for Session-based Recommendation [J]. Computer Science, 2022, 49(11): 117-125.
[6]	MIAO Lan-xin, LEI Yu, ZENG Peng-peng, LI Xiao-yu, SONG Jing-kuan. Granularity-aware and Semantic Aggregation Based Image-Text Retrieval Network [J]. Computer Science, 2022, 49(11): 134-140.
[7]	ZHENG Shun-yuan, HU Liang-xiao, LYU Xiao-qian, SUN Xin, ZHANG Sheng-ping. Edge Guided Self-correction Skin Detection [J]. Computer Science, 2022, 49(11): 141-147.
[8]	HE Yu-lin, LI Xu, JIN Yi, HUANG Zhe-xue. Handwritten Character Recognition Based on Decomposition Extreme Learning Machine [J]. Computer Science, 2022, 49(11): 148-155.
[9]	HE Huang-xing, CHEN Ai-guo, WANG Jiao-long. Handwritten Image Binarization Based on Background Estimation and Local Adaptive Integration [J]. Computer Science, 2022, 49(11): 163-169.
[10]	ZHANG Yu-xin, CHEN Yi-qiang. Driver Distraction Detection Based on Multi-scale Feature Fusion Network [J]. Computer Science, 2022, 49(11): 170-178.
[11]	PAN Hui-ping, WANG Min-qin, ZHANG Fu-quan. Traffic Sign Detection and Recognition Method Based on Optimized YOLO-V4 [J]. Computer Science, 2022, 49(11): 179-184.
[12]	DENG Liang, CAO Cun-gen. Methods of Patent Knowledge Graph Construction [J]. Computer Science, 2022, 49(11): 185-196.
[13]	CHENG Hua-ling, CHEN Yan-ping, YANG Wei-zhe, QIN Yong-bin, HUANG Rui-zhang. Relation Extraction Based on Multidimensional Semantic Mapping [J]. Computer Science, 2022, 49(11): 206-211.
[14]	WEI Jun-sheng, LIU Yan, CHEN Jing, DUAN Shun-ran. Universal Multi-class Ensemble Method with Self Adaptive Weights [J]. Computer Science, 2022, 49(11): 212-220.
[15]	ZHANG Zhou, ZHU Jun-guo, YU Zheng-tao. Incorporating Part of Speech and Tonal Features for Vietnamese Grammatical Error Detection [J]. Computer Science, 2022, 49(11): 221-227.