计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 131-136.doi: 10.11896/j.issn.1002-137X.2019.03.019
谭凯,吴庆波,孟凡满,许林峰
TAN Kai, WU Qing-bo, MENG Fan-man, XU Lin-feng
摘要: 随着视频广告在检索和用户推荐等领域的广泛应用,视频广告的分类成为一个重要问题。与现有视频分类任务不同,视频广告有其自身的特点:1)在时域上,产品对象在广告视频中的出现具有非周期性和稀疏性的特点,这使得分类任务需要排除大量与视频类别不相关的视频帧的干扰,利用少数相关视频帧进行分类;2)在空域上,视频帧中除产品外,还包含复杂背景的问题,这使得有效捕捉产品信息变得困难。为了解决上述问题,文中提出了一种基于镜头分割和空域注意力模型的视频广告分类方法,简称SSSA。针对视频中存在的大量干扰帧,文中使用基于镜头切换的分割方法采样视频帧。针对视频帧中包含复杂背景,文中在网络中引入视觉注意力机制帮助网络从产品相关区域提取判别性的特征。为了验证所提方法的有效性,构建了一个包含1k000多个视频广告的数据库(简称TAV)并收集了眼动数据来训练注意力模型。实验结果显示,提出的SSSA视频分类方法比现有的视频分类方法在性能上提升了10%。
中图分类号:
[1]WU Q,LI H,WANG Z,et al.Blind image quality assessment based on rank-order regularized regression.IEEE Transactions on Multimedia,2017,19(11):2490-2504. [2]MENG F,LI H,WU Q,et al.Seeds-based part segmentation by seeds propagation and region convexity decomposition.IEEE Transactions on Multimedia,2018,20(2):310-322. [3]WU Q,LI H,NGAN K N,et al.Blind image quality assessment using local consistency aware retriever and uncertainty aware evaluator.IEEE Transactions on Circuits and Systems for Video Technology,2018,28(9):2078-2089. [4]TAN K,XU L,LIU Y,et al.Small group detection in crowds using interaction information.IEICE Transactions on Information and Systems,2017,100(7):1542-1545. [5]WU Q,LI H,MENG F,et al.A perceptually weighted rank correlation indicator for objective image quality assessment.IEEE Transactions on Image Processing,2018,27(5):2499-2513. [6]MENG F,CAI J F,LI H.Cosegmentation of multiple image groups.Computer Vision and Image Understanding,2016,146:67-76. [7]WU Q,LI H,MENG F,et al.Blind image quality assessment based on multichannel feature fusion and label transfer.IEEE Transactions on Circuits and Systems for Video Technology,2016,26(3):425-440. [8]HU W,HU R,XIE N,et al.Image classification using multiscale information fusion based on saliency driven nonlinear diffusion filtering.IEEE Transactions on Image Processing,2014,23(4):1513-1526. [9]ISCEN A,TOLIAS G,GOSSELINP H,et al.A comparison of dense region detectors for image search and fine-grained classification.IEEE Transactions on Image Processing,2015,24(8):2369-2381. [10]XIAO T,XU Y,YANG K,et al.The application of two-level attention models in deep convolutional neural network for fine-grained image classification∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:842-850. [11]SIMONYAN K,ZISSERMAN A.Two-stream convolutional networks for action recognition in videos∥Advances in Neural Information Processing Systems.2014:568-576. [12]TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3d convolutional networks∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2015:4489-4497. [13]DONAHUE J,HENDRICKS L A,GUADARRAMA S,et al.Long- term recurrent convolutional networks for visual recognition and description∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:2625-2634. [14]DAVE A,RUSSAKOVSKY O,RAMANAN D.Predictive-cor- rective networks for action detection∥Proceedings of the Computer Vision and Pattern Recognition.IEEE,2017. [15]JHUANG H,GALL J,ZUFFI S,et al.Towards understanding action recognition∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2013:3192-3199. [16]CHRON G,IVAN L,et al.P-CNN:Pose-based CNN features for action recognition∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2015:3218-3226. [17]KARPATHY A,TODERICI G,SHETTY S,et al.Large-scale video classification with convolutional neural networks∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:1725-1732. [18]NG J Y H,HAUSKNECHT M J,VIJAYANARASIMHAN S,et al.Beyond short snippets:Deep networks for video classification∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:4694-4702. [19]MENG F,LI H,WU Q,et al.Weakly supervised part proposal segmentation from multiple images.IEEE Trans.Image Processing,2017,26(8):4019-4031. [20]MENG F,LI H,WU Q,et al.Globally measuring the similarity of superpixels by binary edge maps for superpixel clustering.IEEE Transactions on Circuits and Systems for Video Technology,2018,28(4):906-919. [21]MENG F,LI H,LIU G,et al.Object co-segmentation based on shortest path algorithm and saliency model.IEEE Transactions on Multimedia,2012,14(5):1429-1441. [22]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:3431-3440. [23] FEICHTENHOFER C,PINZ A,ZISSERMAN A.Convolutional two-stream network fusion for video action recognition∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:1933-1941. |
[1] | 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙. 数据流概念漂移处理方法研究综述 Survey of Concept Drift Handling Methods in Data Streams 计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112 |
[2] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[3] | 周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022 |
[4] | 戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032 |
[5] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[6] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[7] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[8] | 吴子仪, 李邵梅, 姜梦函, 张建朋. 基于自注意力模型的本体对齐方法 Ontology Alignment Method Based on Self-attention 计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190 |
[9] | 方义秋, 张震坤, 葛君伟. 基于自注意力机制和迁移学习的跨领域推荐算法 Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning 计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011 |
[10] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[11] | 魏恺轩, 付莹. 基于重参数化多尺度融合网络的高效极暗光原始图像降噪 Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising 计算机科学, 2022, 49(8): 120-126. https://doi.org/10.11896/jsjkx.220200179 |
[12] | 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉. 基于边框距离度量的增量目标检测方法 Incremental Object Detection Method Based on Border Distance Measurement 计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132 |
[13] | 陈坤峰, 潘志松, 王家宝, 施蕾, 张锦. 基于双目叠加仿生的微换衣行人再识别 Moderate Clothes-Changing Person Re-identification Based on Bionics of Binocular Summation 计算机科学, 2022, 49(8): 165-171. https://doi.org/10.11896/jsjkx.210600140 |
[14] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[15] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
|