计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 131-136.doi: 10.11896/j.issn.1002-137X.2019.03.019

• 2018 中国多媒体大会 • 上一篇    下一篇



  1. 电子科技大学信息与通信工程学院 成都 611731
  • 收稿日期:2018-07-20 修回日期:2018-09-29 出版日期:2019-03-15 发布日期:2019-03-22
  • 通讯作者: 吴庆波(1985-),男,博士,副教授,主要研究方向为图像视频编码和质量评价,E-mail:qbwu@uestc.edu.cn(通信作者)
  • 作者简介:谭凯(1988-),男,博士生,主要研究方向为视觉注意力、对象检测和质量评价,E-mail:kaitanuestc@gmail.com;孟凡满(1984-),男,博士,副教授,主要研究方向为图像分割和对象检测;许林峰(1976-),男,博士,副教授,主要研究方向为视觉注意力、图像视频编码、视觉信号处理和多媒体通信系统。
  • 基金资助:

Video Advertisement Classification Method Based on Shot Segmentation and Spatial Attention Model

TAN Kai, WU Qing-bo, MENG Fan-man, XU Lin-feng   

  1. (School of Information and Communication Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China)
  • Received:2018-07-20 Revised:2018-09-29 Online:2019-03-15 Published:2019-03-22

摘要: 随着视频广告在检索和用户推荐等领域的广泛应用,视频广告的分类成为一个重要问题。与现有视频分类任务不同,视频广告有其自身的特点:1)在时域上,产品对象在广告视频中的出现具有非周期性和稀疏性的特点,这使得分类任务需要排除大量与视频类别不相关的视频帧的干扰,利用少数相关视频帧进行分类;2)在空域上,视频帧中除产品外,还包含复杂背景的问题,这使得有效捕捉产品信息变得困难。为了解决上述问题,文中提出了一种基于镜头分割和空域注意力模型的视频广告分类方法,简称SSSA。针对视频中存在的大量干扰帧,文中使用基于镜头切换的分割方法采样视频帧。针对视频帧中包含复杂背景,文中在网络中引入视觉注意力机制帮助网络从产品相关区域提取判别性的特征。为了验证所提方法的有效性,构建了一个包含1k000多个视频广告的数据库(简称TAV)并收集了眼动数据来训练注意力模型。实验结果显示,提出的SSSA视频分类方法比现有的视频分类方法在性能上提升了10%。

关键词: 标注, 分类, 视频广告, 注意力

Abstract: As video advertisement is increasingly used in some areas such as search and user recommendation,advertisement video classification becomes an important issue and poses a significant challenge for computer vision.Different from the existing video classification task,there are two challenges of advertisement video classification.First,advertised products appear in advertisement video aperiodically and sparsely.This means that most of frames are irrelevant to advertisement category,which can potentially cause interference with classification models.Second,there are complex background in advertisement video which makes it hard to extract useful information of product.To solve these problems,this paper proposed an advertisement video classification method based on shot segmentation and spatial attention model (SSSA).To address interference of irrelevant frames,a shot based partitioning method was used to sample frames.To solve the influence of complex background on feature extraction,the attention mechanism was embedded into SSSA to locate products and extract discriminative feature from the attention area which is mostly related to the advertised products.An attention predictionnetwork (APN) was trained to predict the attention map.To verify the proposed model,this paper introduced a new thousand-level dataset for advertisement video classification named TAV,and the gaze data were also collected to train the APN.Experiments evaluated on the TAV dataset demonstrate that the performance of the proposed model improves about 10% compared with the state-of-the-art video classification methods.

Key words: Annotation, Attention, Classification, Video advertisement


  • TP391.9
[1]WU Q,LI H,WANG Z,et al.Blind image quality assessment based on rank-order regularized regression.IEEE Transactions on Multimedia,2017,19(11):2490-2504.
[2]MENG F,LI H,WU Q,et al.Seeds-based part segmentation by seeds propagation and region convexity decomposition.IEEE Transactions on Multimedia,2018,20(2):310-322.
[3]WU Q,LI H,NGAN K N,et al.Blind image quality assessment using local consistency aware retriever and uncertainty aware evaluator.IEEE Transactions on Circuits and Systems for Video Technology,2018,28(9):2078-2089.
[4]TAN K,XU L,LIU Y,et al.Small group detection in crowds
using interaction information.IEICE Transactions on Information and Systems,2017,100(7):1542-1545.
[5]WU Q,LI H,MENG F,et al.A perceptually weighted rank correlation indicator for objective image quality assessment.IEEE Transactions on Image Processing,2018,27(5):2499-2513.
[6]MENG F,CAI J F,LI H.Cosegmentation of multiple image
groups.Computer Vision and Image Understanding,2016,146:67-76.
[7]WU Q,LI H,MENG F,et al.Blind image quality assessment based on multichannel feature fusion and label transfer.IEEE Transactions on Circuits and Systems for Video Technology,2016,26(3):425-440.
[8]HU W,HU R,XIE N,et al.Image classification using multiscale
information fusion based on saliency driven nonlinear diffusion filtering.IEEE Transactions on Image Processing,2014,23(4):1513-1526.
[9]ISCEN A,TOLIAS G,GOSSELINP H,et al.A comparison of dense region detectors for image search and fine-grained classification.IEEE Transactions on Image Processing,2015,24(8):2369-2381.
[10]XIAO T,XU Y,YANG K,et al.The application of two-level attention models in deep convolutional neural network for fine-grained image classification∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:842-850.
[11]SIMONYAN K,ZISSERMAN A.Two-stream convolutional
networks for action recognition in videos∥Advances in Neural Information Processing Systems.2014:568-576.
[12]TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3d convolutional networks∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2015:4489-4497.
term recurrent convolutional networks for visual recognition and description∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:2625-2634.
rective networks for action detection∥Proceedings of the Computer Vision and Pattern Recognition.IEEE,2017.
[15]JHUANG H,GALL J,ZUFFI S,et al.Towards understanding action recognition∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2013:3192-3199.
[16]CHRON G,IVAN L,et al.P-CNN:Pose-based CNN features for action recognition∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2015:3218-3226.
[17]KARPATHY A,TODERICI G,SHETTY S,et al.Large-scale
video classification with convolutional neural networks∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:1725-1732.
[18]NG J Y H,HAUSKNECHT M J,VIJAYANARASIMHAN S,et al.Beyond short snippets:Deep networks for video classification∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:4694-4702.
[19]MENG F,LI H,WU Q,et al.Weakly supervised part proposal segmentation from multiple images.IEEE Trans.Image Processing,2017,26(8):4019-4031.
[20]MENG F,LI H,WU Q,et al.Globally measuring the similarity of superpixels by binary edge maps for superpixel clustering.IEEE Transactions on Circuits and Systems for Video Technology,2018,28(4):906-919.
[21]MENG F,LI H,LIU G,et al.Object co-segmentation based on shortest path algorithm and saliency model.IEEE Transactions on Multimedia,2012,14(5):1429-1441.
[22]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:3431-3440.
[23] FEICHTENHOFER C,PINZ A,ZISSERMAN A.Convolutional two-stream network fusion for video action recognition∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:1933-1941.
[1] 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙.
Survey of Concept Drift Handling Methods in Data Streams
计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112
[2] 周芳泉, 成卫青.
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[3] 周旭, 钱胜胜, 李章明, 方全, 徐常胜.
Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification
计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[4] 戴禹, 许林峰.
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[5] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[6] 熊丽琴, 曹雷, 赖俊, 陈希亮.
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[7] 饶志双, 贾真, 张凡, 李天瑞.
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[8] 吴子仪, 李邵梅, 姜梦函, 张建朋.
Ontology Alignment Method Based on Self-attention
计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190
[9] 方义秋, 张震坤, 葛君伟.
Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning
计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011
[10] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[11] 魏恺轩, 付莹.
Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising
计算机科学, 2022, 49(8): 120-126. https://doi.org/10.11896/jsjkx.220200179
[12] 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉.
Incremental Object Detection Method Based on Border Distance Measurement
计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132
[13] 陈坤峰, 潘志松, 王家宝, 施蕾, 张锦.
Moderate Clothes-Changing Person Re-identification Based on Bionics of Binocular Summation
计算机科学, 2022, 49(8): 165-171. https://doi.org/10.11896/jsjkx.210600140
[14] 孙奇, 吉根林, 张杰.
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[15] 檀莹莹, 王俊丽, 张超波.
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
Full text



No Suggested Reading articles found!