计算机科学, 2019, Vol. 46 Issue (3): 131-136.doi: 10.11896/j.issn.1002-137X.2019.03.019

• 2018 中国多媒体大会 • 上一篇    下一篇



  1. 电子科技大学信息与通信工程学院 成都 611731
  收稿日期:2018-07-20 修回日期:2018-09-29 出版日期:2019-03-15 发布日期:2019-03-22
  • 通讯作者: 吴庆波(1985-),男,博士,副教授,主要研究方向为图像视频编码和质量评价,E-mail:qbwu@uestc.edu.cn(通信作者)
  • 作者简介:谭凯(1988-),男,博士生,主要研究方向为视觉注意力、对象检测和质量评价,E-mail:kaitanuestc@gmail.com;孟凡满(1984-),男,博士,副教授,主要研究方向为图像分割和对象检测;许林峰(1976-),男,博士,副教授,主要研究方向为视觉注意力、图像视频编码、视觉信号处理和多媒体通信系统。
  • 基金资助:

Video Advertisement Classification Method Based on Shot Segmentation and Spatial Attention Model

TAN Kai, WU Qing-bo, MENG Fan-man, XU Lin-feng   

  1. (School of Information and Communication Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China)
  Received:2018-07-20 Revised:2018-09-29 Online:2019-03-15 Published:2019-03-22

摘要: 随着视频广告在检索和用户推荐等领域的广泛应用,视频广告的分类成为一个重要问题。与现有视频分类任务不同,视频广告有其自身的特点:1)在时域上,产品对象在广告视频中的出现具有非周期性和稀疏性的特点,这使得分类任务需要排除大量与视频类别不相关的视频帧的干扰,利用少数相关视频帧进行分类;2)在空域上,视频帧中除产品外,还包含复杂背景的问题,这使得有效捕捉产品信息变得困难。为了解决上述问题,文中提出了一种基于镜头分割和空域注意力模型的视频广告分类方法,简称SSSA。针对视频中存在的大量干扰帧,文中使用基于镜头切换的分割方法采样视频帧。针对视频帧中包含复杂背景,文中在网络中引入视觉注意力机制帮助网络从产品相关区域提取判别性的特征。为了验证所提方法的有效性,构建了一个包含1k000多个视频广告的数据库(简称TAV)并收集了眼动数据来训练注意力模型。实验结果显示,提出的SSSA视频分类方法比现有的视频分类方法在性能上提升了10%。

关键词: 标注, 分类, 视频广告, 注意力

Abstract: As video advertisement is increasingly used in some areas such as search and user recommendation,advertisement video classification becomes an important issue and poses a significant challenge for computer vision.Different from the existing video classification task,there are two challenges of advertisement video classification.First,advertised products appear in advertisement video aperiodically and sparsely.This means that most of frames are irrelevant to advertisement category,which can potentially cause interference with classification models.Second,there are complex background in advertisement video which makes it hard to extract useful information of product.To solve these problems,this paper proposed an advertisement video classification method based on shot segmentation and spatial attention model (SSSA).To address interference of irrelevant frames,a shot based partitioning method was used to sample frames.To solve the influence of complex background on feature extraction,the attention mechanism was embedded into SSSA to locate products and extract discriminative feature from the attention area which is mostly related to the advertised products.An attention predictionnetwork (APN) was trained to predict the attention map.To verify the proposed model,this paper introduced a new thousand-level dataset for advertisement video classification named TAV,and the gaze data were also collected to train the APN.Experiments evaluated on the TAV dataset demonstrate that the performance of the proposed model improves about 10% compared with the state-of-the-art video classification methods.

Key words: Annotation, Attention, Classification, Video advertisement


  • TP391.9
