计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 131-136.doi: 10.11896/j.issn.1002-137X.2019.03.019

• 2018 中国多媒体大会 • 上一篇    下一篇

基于镜头分割与空域注意力模型的视频广告分类方法

谭凯,吴庆波,孟凡满,许林峰   

  1. 电子科技大学信息与通信工程学院 成都 611731
  • 收稿日期:2018-07-20 修回日期:2018-09-29 出版日期:2019-03-15 发布日期:2019-03-22
  • 通讯作者: 吴庆波(1985-),男,博士,副教授,主要研究方向为图像视频编码和质量评价,E-mail:qbwu@uestc.edu.cn(通信作者)
  • 作者简介:谭凯(1988-),男,博士生,主要研究方向为视觉注意力、对象检测和质量评价,E-mail:kaitanuestc@gmail.com;孟凡满(1984-),男,博士,副教授,主要研究方向为图像分割和对象检测;许林峰(1976-),男,博士,副教授,主要研究方向为视觉注意力、图像视频编码、视觉信号处理和多媒体通信系统。
  • 基金资助:
    国家自然科学基金(61601102,61502084,61871087)资助

Video Advertisement Classification Method Based on Shot Segmentation and Spatial Attention Model

TAN Kai, WU Qing-bo, MENG Fan-man, XU Lin-feng   

  1. (School of Information and Communication Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China)
  • Received:2018-07-20 Revised:2018-09-29 Online:2019-03-15 Published:2019-03-22

摘要: 随着视频广告在检索和用户推荐等领域的广泛应用,视频广告的分类成为一个重要问题。与现有视频分类任务不同,视频广告有其自身的特点:1)在时域上,产品对象在广告视频中的出现具有非周期性和稀疏性的特点,这使得分类任务需要排除大量与视频类别不相关的视频帧的干扰,利用少数相关视频帧进行分类;2)在空域上,视频帧中除产品外,还包含复杂背景的问题,这使得有效捕捉产品信息变得困难。为了解决上述问题,文中提出了一种基于镜头分割和空域注意力模型的视频广告分类方法,简称SSSA。针对视频中存在的大量干扰帧,文中使用基于镜头切换的分割方法采样视频帧。针对视频帧中包含复杂背景,文中在网络中引入视觉注意力机制帮助网络从产品相关区域提取判别性的特征。为了验证所提方法的有效性,构建了一个包含1k000多个视频广告的数据库(简称TAV)并收集了眼动数据来训练注意力模型。实验结果显示,提出的SSSA视频分类方法比现有的视频分类方法在性能上提升了10%。

关键词: 分类, 视频广告, 注意力, 标注

Abstract: As video advertisement is increasingly used in some areas such as search and user recommendation,advertisement video classification becomes an important issue and poses a significant challenge for computer vision.Different from the existing video classification task,there are two challenges of advertisement video classification.First,advertised products appear in advertisement video aperiodically and sparsely.This means that most of frames are irrelevant to advertisement category,which can potentially cause interference with classification models.Second,there are complex background in advertisement video which makes it hard to extract useful information of product.To solve these problems,this paper proposed an advertisement video classification method based on shot segmentation and spatial attention model (SSSA).To address interference of irrelevant frames,a shot based partitioning method was used to sample frames.To solve the influence of complex background on feature extraction,the attention mechanism was embedded into SSSA to locate products and extract discriminative feature from the attention area which is mostly related to the advertised products.An attention predictionnetwork (APN) was trained to predict the attention map.To verify the proposed model,this paper introduced a new thousand-level dataset for advertisement video classification named TAV,and the gaze data were also collected to train the APN.Experiments evaluated on the TAV dataset demonstrate that the performance of the proposed model improves about 10% compared with the state-of-the-art video classification methods.

Key words: Classification, Video advertisement, Attention, Annotation

中图分类号: 

  • TP391.9
[1] WU Q,LI H,WANG Z,et al.Blind image quality assessment based on rank-order regularized regression.IEEE Transactions on Multimedia,2017,19(11):2490-2504.
[2] MENG F,LI H,WU Q,et al.Seeds-based part segmentation by seeds propagation and region convexity decomposition.IEEE Transactions on Multimedia,2018,20(2):310-322.
[3] WU Q,LI H,NGAN K N,et al.Blind image quality assessment using local consistency aware retriever and uncertainty aware evaluator.IEEE Transactions on Circuits and Systems for Video Technology,2018,28(9):2078-2089.
[4] TAN K,XU L,LIU Y,et al.Small group detection in crowdsusing interaction information.IEICE Transactions on Information and Systems,2017,100(7):1542-1545.
[5] WU Q,LI H,MENG F,et al.A perceptually weighted rank correlation indicator for objective image quality assessment.IEEE Transactions on Image Processing,2018,27(5):2499-2513.
[6] MENG F,CAI J F,LI H.Cosegmentation of multiple imagegroups.Computer Vision and Image Understanding,2016,146:67-76.
[7] WU Q,LI H,MENG F,et al.Blind image quality assessment based on multichannel feature fusion and label transfer.IEEE Transactions on Circuits and Systems for Video Technology,2016,26(3):425-440.
[8] HU W,HU R,XIE N,et al.Image classification using multiscaleinformation fusion based on saliency driven nonlinear diffusion filtering.IEEE Transactions on Image Processing,2014,23(4):1513-1526.
[9] ISCEN A,TOLIAS G,GOSSELINP H,et al.A comparison of dense region detectors for image search and fine-grained classification.IEEE Transactions on Image Processing,2015,24(8):2369-2381.
[10] XIAO T,XU Y,YANG K,et al.The application of two-level attention models in deep convolutional neural network for fine-grained image classification∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:842-850.
[11] SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos∥Advances in Neural Information Processing Systems.2014:568-576.
[12] TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3d convolutional networks∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2015:4489-4497.
[13] DONAHUE J,HENDRICKS L A,GUADARRAMA S,et al.Long-term recurrent convolutional networks for visual recognition and description∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:2625-2634.
[14] DAVE A,RUSSAKOVSKY O,RAMANAN D.Predictive-cor-rective networks for action detection∥Proceedings of the Computer Vision and Pattern Recognition.IEEE,2017.
[15] JHUANG H,GALL J,ZUFFI S,et al.Towards understanding action recognition∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2013:3192-3199.
[16] CHRON G,IVAN L,et al.P-CNN:Pose-based CNN features for action recognition∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2015:3218-3226.
[17] KARPATHY A,TODERICI G,SHETTY S,et al.Large-scalevideo classification with convolutional neural networks∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:1725-1732.
[18] NG J Y H,HAUSKNECHT M J,VIJAYANARASIMHAN S,et al.Beyond short snippets:Deep networks for video classification∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:4694-4702.
[19] MENG F,LI H,WU Q,et al.Weakly supervised part proposal segmentation from multiple images.IEEE Trans.Image Processing,2017,26(8):4019-4031.
[20] MENG F,LI H,WU Q,et al.Globally measuring the similarity of superpixels by binary edge maps for superpixel clustering.IEEE Transactions on Circuits and Systems for Video Technology,2018,28(4):906-919.
[21] MENG F,LI H,LIU G,et al.Object co-segmentation based on shortest path algorithm and saliency model.IEEE Transactions on Multimedia,2012,14(5):1429-1441.
[22] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:3431-3440.
[23] FEICHTENHOFER C,PINZ A,ZISSERMAN A.Convolutional two-stream network fusion for video action recognition∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:1933-1941.
[1] 陈洁婷, 王维莹, 金琴. 弹幕信息协助下的视频多标签分类[J]. 计算机科学, 2021, 48(1): 167-174.
[2] 赵佳琦, 王瀚正, 周勇, 张迪, 周子渊. 基于多尺度与注意力特征增强的遥感图像描述生成方法[J]. 计算机科学, 2021, 48(1): 190-196.
[3] 刘洋, 金忠. 一种结合非局部和多区域注意力机制的细粒度图像识别方法[J]. 计算机科学, 2021, 48(1): 197-203.
[4] 王瑞平, 贾真, 刘畅, 陈泽威, 李天瑞. 基于DeepFM的深度兴趣因子分解机网络[J]. 计算机科学, 2021, 48(1): 226-232.
[5] 于文家, 丁世飞. 基于自注意力机制的条件生成对抗网络[J]. 计算机科学, 2021, 48(1): 241-246.
[6] 王润正, 高见, 黄淑华, 仝鑫. 基于知识蒸馏的恶意代码家族检测方法[J]. 计算机科学, 2021, 48(1): 280-286.
[7] 张佳嘉, 张小洪. 多分支卷积神经网络肺结节分类方法及其可解释性[J]. 计算机科学, 2020, 47(9): 129-134.
[8] 崔彤彤, 王桂玲, 高晶. 基于1DCNN-LSTM的船舶轨迹分类方法[J]. 计算机科学, 2020, 47(9): 175-184.
[9] 潘祖江, 刘宁, 张伟, 王建勇. 基于层次注意力机制的多任务疾病进展模型[J]. 计算机科学, 2020, 47(9): 185-189.
[10] 刘海潮, 王莉. 基于深度图卷积胶囊网络的图分类模型[J]. 计算机科学, 2020, 47(9): 219-225.
[11] 赵威, 林煜明, 王超强, 蔡国永. 基于依赖联系分析的观点词对协同抽取[J]. 计算机科学, 2020, 47(8): 164-170.
[12] 刘凌云, 钱辉, 邢红杰, 董春茹, 张峰. 一种基于Q-学习算法的增量分类模型[J]. 计算机科学, 2020, 47(8): 171-177.
[13] 程婧, 刘娜娜, 闵可锐, 康昱, 王新, 周扬帆. 一种低频词词向量优化方法及其在短文本分类中的应用[J]. 计算机科学, 2020, 47(8): 255-260.
[14] 王慧, 乐孜纯, 龚轩, 武玉坤, 左浩. 基于特征分类的链路预测方法综述[J]. 计算机科学, 2020, 47(8): 302-312.
[15] 刘肖, 袁冠, 张艳梅, 闫秋艳, 王志晓. 基于自适应多分类器融合的手势识别[J]. 计算机科学, 2020, 47(7): 103-110.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .