计算机科学 ›› 2021, Vol. 48 ›› Issue (1): 167-174.doi: 10.11896/jsjkx.200800198
陈洁婷, 王维莹, 金琴
CHEN Jie-ting, WANG Wei-ying, JIN Qin
摘要: 文中探究了弹幕信息协助下的视频多标签分类任务。多标签视频分类任务根据视频内容从不同角度赋予视频多个标签,与视频推荐等应用紧密相关。多标签视频数据集的高标注成本和对视频内容的多角度理解是该研究领域面临的主要问题。弹幕是一种新近出现的用户评论形式,受到了众多用户的欢迎。由于用户参与度高,弹幕视频网站的视频拥有大量用户自发添加的标签,这些标签是天然的多标签数据。文中以此构建了一个多标签视频数据集,并整理出了视频标签间的层级语义关系,该数据集在未来将公开发布。同时,弹幕文本模态包含大量与视频内容相关的细粒度信息,因此在以往视频分类工作融合视觉和音频模态的基础上,引入弹幕文本模态进行视频多标签分类研究。在基于聚类的NeXtVLAD模型、注意力Dbof模型和基于时序的GRU模型上进行实验,在增加弹幕模态后,GAP指标最高提升了23%,证明了弹幕信息对该任务具有辅助作用。此外,还探索了如何在分类中利用标签层级关系,通过构建标签关系矩阵来改造标签,进而将标签语义融入训练。实验结果表明,加入标签关系后,Hit@1指标提升了15%,因此其能优化多标签分类的效果。此外,MAP指标在细粒度小类上提升了4%,说明标签语义的引入有利于预测样本量较少的类别,具有研究价值。
中图分类号:
[1] LIN R,XIAO J,FAN J.Nextvlad:An efficient neural network to aggregate frame-level features for large-scale video classification[C]//Proceedings of the European Conference on Computer Vision (ECCV).Munich,Germany,2018. [2] GARG S.Learning video features for multi-label classification[C]//Proceedings of the European Conference on Computer Vision (ECCV).Munich,Germany,2018. [3] ABU-EL-HAIJA S,KOTHARI N,LEE J,et al.Youtube-8m:A large-scale video classification benchmark[J].arXiv:1609.086.75. [4] CHO K,VAN MERRIENBOER B,BAHDANAU D,et al.On the properties of neural machine translation:Encoder-decoder approaches[J].arXiv:1409.1259. [5] LEE J,NATSEV A,READE W,et al.The 2nd YouTube-8M Large-Scale Video Understanding Challenge[C]//Proceedings of the European Conference on Computer Vision (ECCV).Munich,Germany,2018:193-205. [6] YANG W,RUAN N,GAO W,et al.Crowdsourced time-sync video tagging using semantic association graph[C]//2017 IEEE International Conference on Multimedia and Expo (ICME).Hong Kong,China,2017:547-552. [7] LIAO Z,XIAN Y,YANG X,et al.TSCSet:A crowdsourcedtime-sync comment dataset for exploration of user experience improvement[C]//23rd International Conference on Intelligent User Interfaces.Tokyo,Japan,2018:641-652. [8] BAI Q,HU Q V,GE L,et al.Stories That Big Danmaku Data Can Tell as a New Media[J].IEEE Access,2019,7:53509-53519. [9] MA S,CUI L,DAI D,et al.Livebot:Generating live video comments based on visual and textual contexts[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Hilton Hawaiian Village,Honolulu,Hawaii,USA,2019,33:6810-6817. [10] OLSEN D R,MOON B.Video summarization based on user interaction[C]//Proceedings of the 9th European Conference on Interactive TV and Video.Lisbon,Portugal,2011:115-122. [11] WANG X,JIANG Y G,CHAI Z,et al.Real-timesummarization of user-generated videos based on semantic recognition[C]//Proceedings of the 22nd ACM International Conference on Multimedia.Orlando,Florida,USA,2014:849-852. [12] SÁNCHEZ J,PERRONNIN F,MENSINK T,et al.Image classification with the fisher vector:Theory and practice[J].International Journal ofCcomputer Vision,2013,105(3):222-245. [13] JÉGOU H,DOUZE M,SCHMID C,et al.Aggregating local descriptors into a compact image representation[C]//2010 IEEE computer society conference on computer vision and pattern re-cognition.San Francisco,California,USA,2010:3304-3311. [14] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural computation,1997,9(8):1735-1780. [15] MIECH A,LAPTEV I,SIVIC J.Learnable pooling with context gating for video classification[J].arXiv:1706.06905. [16] JÉGOU H,DOUZE M,SCHMID C,et al.Aggregating local descriptors into a compact image representation[C]//2010 IEEE computer society conference on computer vision and pattern recognition.San Francisco,California,USA,2010:3304-3311. [17] PENG H,LI J,HE Y,et al.Large-scale hierarchical text classification with recursively regularized deep graph-cnn[C]//Proceedings of the 2018 World Wide Web Conference.Lyon,France,2018:1063-1072. [18] WANG L,CHEN S,ZHOU H.Boosting Up Segment-level Video Classification Performance with Label Correlation and Reweighting[EB/OL].https://static.googleusercontent.com/media/research.google.com/zh-CN//youtube8m/workshop2019/c_07.pdf. [19] BANERJEE S,AKKAYA C,PEREZ-SORROSAL F,et al.Hierarchical Transfer Learning for Multi-label Text Classification[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Fortezza da Basso,Florence,Italy,2019:6295-6300. [20] CHEN B,HUANG X,XIAO L,et al.Hyperbolic Capsule Networks for Multi-Label Classification[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Seattle,Washington,USA,2020:3115-3124. [21] POUYANFAR S,WANG T,CHEN S C.Residual Attention-Based Fusion for Video Classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.Long Beach,California,USA,2019. [22] WANG Z,KUAN K,RAVAUT M,et al.Truly multi-modal youtube-8m video classification with video,audio,and text[J].arXiv:1706.05461. [23] HE X,PENG Y.Fine-grained image classification via combining vision and language[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,Hawaii,USA,2017:5994-6002. [24] 中国人工智能学会,知乎.2017知乎看山杯机器学习挑战赛[EB/OL].https://www.biendata.xyz/competition/zhihu/. [25] PARTALAS I,KOSMOPOULOS A,BASKIOTIS N,et al.Lshtc:A benchmark for large-scale text classification[J].arXiv:1503.08581. [26] HE K,ZHANG X,REN S,et al.Deep residual learning for ima-ge recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,NV,USA,2016:770-778. |
[1] | 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙. 数据流概念漂移处理方法研究综述 Survey of Concept Drift Handling Methods in Data Streams 计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112 |
[2] | 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙. 基于自然语言的视频片段定位综述 Overview of Natural Language Video Localization 计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130 |
[3] | 周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022 |
[4] | 曲倩文, 车啸平, 曲晨鑫, 李瑾如. 基于信息感知的虚拟现实用户临场感研究 Study on Information Perception Based User Presence in Virtual Reality 计算机科学, 2022, 49(9): 146-154. https://doi.org/10.11896/jsjkx.220500200 |
[5] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[6] | 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111 |
[7] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[8] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[9] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[10] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[11] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[12] | 杨炳新, 郭艳蓉, 郝世杰, 洪日昌. 基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用 Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition 计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070 |
[13] | 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥. 视频理解中的动作质量评估方法综述 Survey on Action Quality Assessment Methods in Video Understanding 计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028 |
[14] | 黄璞, 沈阳阳, 杜旭然, 杨章静. 基于局部约束特征线表示的人脸识别 Face Recognition Based on Locality Constrained Feature Line Representation 计算机科学, 2022, 49(6A): 429-433. https://doi.org/10.11896/jsjkx.210300169 |
[15] | 刘云, 董守杰. 基于CUDA核函数的多路视频图像拼接加速算法 Acceleration Algorithm of Multi-channel Video Image Stitching Based on CUDA Kernel Function 计算机科学, 2022, 49(6A): 441-446. https://doi.org/10.11896/jsjkx.210600043 |
|