弹幕信息协助下的视频多标签分类

doi:10.11896/jsjkx.200800198

Abstract

Abstract: This work explores the multi-label video classification task assisted by danmaku.Multi-label video classification can associate multiple tags to a video from different aspects,which can benefit video understanding tasks such as video recommendation.There are two challenges in this task,one is the high annotation cost of dataset,and the other is how to understand video from multi-aspect and multimodal perspectives.Danmaku is a new trend of online commenting.Danmaku video has lots of manual annotations added by website users for high user engagement.It can be used as classification data directly.This work collects a multi-label danmaku video dataset and builds a hierarchical label correlation structure for the first time on danmaku video data.The dataset will be released in the future.Danmaku contains informative and fine-grained interaction data with the video content.This paper introduces danmaku modality to assist classification based on previous works,most of which only combine the visual and audio modalities.This paper choses cluster-based model NeXtVLAD,attention Dbof and temporal based GRU models as baselines.Experiments show that danmaku data is helpful,which improves GAP by 0.23.This paper also explores the use of label correlation,updating the video labels by a relationship matrix to integrate the semantic information into training.Experiments show that the leverage of label correlation improves Hit@1 by 0.15.Besides,the MAP can be improved by 0.04 in fine-grained labels,which indicates that the label semantic information benefits the prediction of small classes and it is valuable to explore.

Key words: Classification, Danmaku, Label correlation, Multi-label, Multi-modal, Video

CLC Number:

TP399

CHEN Jie-ting, WANG Wei-ying, JIN Qin. Multi-label Video Classification Assisted by Danmaku[J].Computer Science, 2021, 48(1): 167-174.

References

[1] LIN R,XIAO J,FAN J.Nextvlad:An efficient neural network to aggregate frame-level features for large-scale video classification[C]//Proceedings of the European Conference on Computer Vision (ECCV).Munich,Germany,2018.
[2] GARG S.Learning video features for multi-label classification[C]//Proceedings of the European Conference on Computer Vision (ECCV).Munich,Germany,2018.
[3] ABU-EL-HAIJA S,KOTHARI N,LEE J,et al.Youtube-8m:A large-scale video classification benchmark[J].arXiv:1609.086.75.
[4] CHO K,VAN MERRIENBOER B,BAHDANAU D,et al.On the properties of neural machine translation:Encoder-decoder approaches[J].arXiv:1409.1259.
[5] LEE J,NATSEV A,READE W,et al.The 2nd YouTube-8M Large-Scale Video Understanding Challenge[C]//Proceedings of the European Conference on Computer Vision (ECCV).Munich,Germany,2018:193-205.
[6] YANG W,RUAN N,GAO W,et al.Crowdsourced time-sync video tagging using semantic association graph[C]//2017 IEEE International Conference on Multimedia and Expo (ICME).Hong Kong,China,2017:547-552.
[7] LIAO Z,XIAN Y,YANG X,et al.TSCSet:A crowdsourcedtime-sync comment dataset for exploration of user experience improvement[C]//23rd International Conference on Intelligent User Interfaces.Tokyo,Japan,2018:641-652.
[8] BAI Q,HU Q V,GE L,et al.Stories That Big Danmaku Data Can Tell as a New Media[J].IEEE Access,2019,7:53509-53519.
[9] MA S,CUI L,DAI D,et al.Livebot:Generating live video comments based on visual and textual contexts[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Hilton Hawaiian Village,Honolulu,Hawaii,USA,2019,33:6810-6817.
[10] OLSEN D R,MOON B.Video summarization based on user interaction[C]//Proceedings of the 9th European Conference on Interactive TV and Video.Lisbon,Portugal,2011:115-122.
[11] WANG X,JIANG Y G,CHAI Z,et al.Real-timesummarization of user-generated videos based on semantic recognition[C]//Proceedings of the 22nd ACM International Conference on Multimedia.Orlando,Florida,USA,2014:849-852.
[12] SÁNCHEZ J,PERRONNIN F,MENSINK T,et al.Image classification with the fisher vector:Theory and practice[J].International Journal ofCcomputer Vision,2013,105(3):222-245.
[13] JÉGOU H,DOUZE M,SCHMID C,et al.Aggregating local descriptors into a compact image representation[C]//2010 IEEE computer society conference on computer vision and pattern re-cognition.San Francisco,California,USA,2010:3304-3311.
[14] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural computation,1997,9(8):1735-1780.
[15] MIECH A,LAPTEV I,SIVIC J.Learnable pooling with context gating for video classification[J].arXiv:1706.06905.
[16] JÉGOU H,DOUZE M,SCHMID C,et al.Aggregating local descriptors into a compact image representation[C]//2010 IEEE computer society conference on computer vision and pattern recognition.San Francisco,California,USA,2010:3304-3311.
[17] PENG H,LI J,HE Y,et al.Large-scale hierarchical text classification with recursively regularized deep graph-cnn[C]//Proceedings of the 2018 World Wide Web Conference.Lyon,France,2018:1063-1072.
[18] WANG L,CHEN S,ZHOU H.Boosting Up Segment-level Video Classification Performance with Label Correlation and Reweighting[EB/OL].https://static.googleusercontent.com/media/research.google.com/zh-CN//youtube8m/workshop2019/c_07.pdf.
[19] BANERJEE S,AKKAYA C,PEREZ-SORROSAL F,et al.Hierarchical Transfer Learning for Multi-label Text Classification[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Fortezza da Basso,Florence,Italy,2019:6295-6300.
[20] CHEN B,HUANG X,XIAO L,et al.Hyperbolic Capsule Networks for Multi-Label Classification[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Seattle,Washington,USA,2020:3115-3124.
[21] POUYANFAR S,WANG T,CHEN S C.Residual Attention-Based Fusion for Video Classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.Long Beach,California,USA,2019.
[22] WANG Z,KUAN K,RAVAUT M,et al.Truly multi-modal youtube-8m video classification with video,audio,and text[J].arXiv:1706.05461.
[23] HE X,PENG Y.Fine-grained image classification via combining vision and language[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,Hawaii,USA,2017:5994-6002.
[24] 中国人工智能学会,知乎.2017知乎看山杯机器学习挑战赛[EB/OL].https://www.biendata.xyz/competition/zhihu/.
[25] PARTALAS I,KOSMOPOULOS A,BASKIOTIS N,et al.Lshtc:A benchmark for large-scale text classification[J].arXiv:1503.08581.
[26] HE K,ZHANG X,REN S,et al.Deep residual learning for ima-ge recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,NV,USA,2016:770-778.

Related Articles 15

[1]	CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2]	NIE Xiu-shan, PAN Jia-nan, TAN Zhi-fang, LIU Xin-fang, GUO Jie, YIN Yi-long. Overview of Natural Language Video Localization [J]. Computer Science, 2022, 49(9): 111-122.
[3]	ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[4]	QU Qian-wen, CHE Xiao-ping, QU Chen-xin, LI Jin-ru. Study on Information Perception Based User Presence in Virtual Reality [J]. Computer Science, 2022, 49(9): 146-154.
[5]	ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[6]	HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[7]	SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[8]	TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[9]	YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[10]	WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[11]	GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang. Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features [J]. Computer Science, 2022, 49(7): 40-49.
[12]	YANG Bing-xin, GUO Yan-rong, HAO Shi-jie, Hong Ri-chang. Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition [J]. Computer Science, 2022, 49(7): 57-63.
[13]	ZHANG Hong-bo, DONG Li-jia, PAN Yu-biao, HSIAO Tsung-chih, ZHANG Hui-zhen, DU Ji-xiang. Survey on Action Quality Assessment Methods in Video Understanding [J]. Computer Science, 2022, 49(7): 79-88.
[14]	HUANG Pu, SHEN Yang-yang, DU Xu-ran, YANG Zhang-jing. Face Recognition Based on Locality Constrained Feature Line Representation [J]. Computer Science, 2022, 49(6A): 429-433.
[15]	LIU Yun, DONG Shou-jie. Acceleration Algorithm of Multi-channel Video Image Stitching Based on CUDA Kernel Function [J]. Computer Science, 2022, 49(6A): 441-446.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Multi-label Video Classification Assisted by Danmaku

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0