计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 131-136.doi: 10.11896/j.issn.1002-137X.2019.03.019
谭凯,吴庆波,孟凡满,许林峰
TAN Kai, WU Qing-bo, MENG Fan-man, XU Lin-feng
摘要: 随着视频广告在检索和用户推荐等领域的广泛应用,视频广告的分类成为一个重要问题。与现有视频分类任务不同,视频广告有其自身的特点:1)在时域上,产品对象在广告视频中的出现具有非周期性和稀疏性的特点,这使得分类任务需要排除大量与视频类别不相关的视频帧的干扰,利用少数相关视频帧进行分类;2)在空域上,视频帧中除产品外,还包含复杂背景的问题,这使得有效捕捉产品信息变得困难。为了解决上述问题,文中提出了一种基于镜头分割和空域注意力模型的视频广告分类方法,简称SSSA。针对视频中存在的大量干扰帧,文中使用基于镜头切换的分割方法采样视频帧。针对视频帧中包含复杂背景,文中在网络中引入视觉注意力机制帮助网络从产品相关区域提取判别性的特征。为了验证所提方法的有效性,构建了一个包含1k000多个视频广告的数据库(简称TAV)并收集了眼动数据来训练注意力模型。实验结果显示,提出的SSSA视频分类方法比现有的视频分类方法在性能上提升了10%。
中图分类号:
[1] | WU Q,LI H,WANG Z,et al.Blind image quality assessment based on rank-order regularized regression.IEEE Transactions on Multimedia,2017,19(11):2490-2504. |
[2] | MENG F,LI H,WU Q,et al.Seeds-based part segmentation by seeds propagation and region convexity decomposition.IEEE Transactions on Multimedia,2018,20(2):310-322. |
[3] | WU Q,LI H,NGAN K N,et al.Blind image quality assessment using local consistency aware retriever and uncertainty aware evaluator.IEEE Transactions on Circuits and Systems for Video Technology,2018,28(9):2078-2089. |
[4] | TAN K,XU L,LIU Y,et al.Small group detection in crowdsusing interaction information.IEICE Transactions on Information and Systems,2017,100(7):1542-1545. |
[5] | WU Q,LI H,MENG F,et al.A perceptually weighted rank correlation indicator for objective image quality assessment.IEEE Transactions on Image Processing,2018,27(5):2499-2513. |
[6] | MENG F,CAI J F,LI H.Cosegmentation of multiple imagegroups.Computer Vision and Image Understanding,2016,146:67-76. |
[7] | WU Q,LI H,MENG F,et al.Blind image quality assessment based on multichannel feature fusion and label transfer.IEEE Transactions on Circuits and Systems for Video Technology,2016,26(3):425-440. |
[8] | HU W,HU R,XIE N,et al.Image classification using multiscaleinformation fusion based on saliency driven nonlinear diffusion filtering.IEEE Transactions on Image Processing,2014,23(4):1513-1526. |
[9] | ISCEN A,TOLIAS G,GOSSELINP H,et al.A comparison of dense region detectors for image search and fine-grained classification.IEEE Transactions on Image Processing,2015,24(8):2369-2381. |
[10] | XIAO T,XU Y,YANG K,et al.The application of two-level attention models in deep convolutional neural network for fine-grained image classification∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:842-850. |
[11] | SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos∥Advances in Neural Information Processing Systems.2014:568-576. |
[12] | TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3d convolutional networks∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2015:4489-4497. |
[13] | DONAHUE J,HENDRICKS L A,GUADARRAMA S,et al.Long-term recurrent convolutional networks for visual recognition and description∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:2625-2634. |
[14] | DAVE A,RUSSAKOVSKY O,RAMANAN D.Predictive-cor-rective networks for action detection∥Proceedings of the Computer Vision and Pattern Recognition.IEEE,2017. |
[15] | JHUANG H,GALL J,ZUFFI S,et al.Towards understanding action recognition∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2013:3192-3199. |
[16] | CHRON G,IVAN L,et al.P-CNN:Pose-based CNN features for action recognition∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2015:3218-3226. |
[17] | KARPATHY A,TODERICI G,SHETTY S,et al.Large-scalevideo classification with convolutional neural networks∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:1725-1732. |
[18] | NG J Y H,HAUSKNECHT M J,VIJAYANARASIMHAN S,et al.Beyond short snippets:Deep networks for video classification∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:4694-4702. |
[19] | MENG F,LI H,WU Q,et al.Weakly supervised part proposal segmentation from multiple images.IEEE Trans.Image Processing,2017,26(8):4019-4031. |
[20] | MENG F,LI H,WU Q,et al.Globally measuring the similarity of superpixels by binary edge maps for superpixel clustering.IEEE Transactions on Circuits and Systems for Video Technology,2018,28(4):906-919. |
[21] | MENG F,LI H,LIU G,et al.Object co-segmentation based on shortest path algorithm and saliency model.IEEE Transactions on Multimedia,2012,14(5):1429-1441. |
[22] | LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:3431-3440. |
[23] | FEICHTENHOFER C,PINZ A,ZISSERMAN A.Convolutional two-stream network fusion for video action recognition∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:1933-1941. |
[1] | 陈洁婷, 王维莹, 金琴. 弹幕信息协助下的视频多标签分类[J]. 计算机科学, 2021, 48(1): 167-174. |
[2] | 赵佳琦, 王瀚正, 周勇, 张迪, 周子渊. 基于多尺度与注意力特征增强的遥感图像描述生成方法[J]. 计算机科学, 2021, 48(1): 190-196. |
[3] | 刘洋, 金忠. 一种结合非局部和多区域注意力机制的细粒度图像识别方法[J]. 计算机科学, 2021, 48(1): 197-203. |
[4] | 王瑞平, 贾真, 刘畅, 陈泽威, 李天瑞. 基于DeepFM的深度兴趣因子分解机网络[J]. 计算机科学, 2021, 48(1): 226-232. |
[5] | 于文家, 丁世飞. 基于自注意力机制的条件生成对抗网络[J]. 计算机科学, 2021, 48(1): 241-246. |
[6] | 王润正, 高见, 黄淑华, 仝鑫. 基于知识蒸馏的恶意代码家族检测方法[J]. 计算机科学, 2021, 48(1): 280-286. |
[7] | 张佳嘉, 张小洪. 多分支卷积神经网络肺结节分类方法及其可解释性[J]. 计算机科学, 2020, 47(9): 129-134. |
[8] | 崔彤彤, 王桂玲, 高晶. 基于1DCNN-LSTM的船舶轨迹分类方法[J]. 计算机科学, 2020, 47(9): 175-184. |
[9] | 潘祖江, 刘宁, 张伟, 王建勇. 基于层次注意力机制的多任务疾病进展模型[J]. 计算机科学, 2020, 47(9): 185-189. |
[10] | 刘海潮, 王莉. 基于深度图卷积胶囊网络的图分类模型[J]. 计算机科学, 2020, 47(9): 219-225. |
[11] | 赵威, 林煜明, 王超强, 蔡国永. 基于依赖联系分析的观点词对协同抽取[J]. 计算机科学, 2020, 47(8): 164-170. |
[12] | 刘凌云, 钱辉, 邢红杰, 董春茹, 张峰. 一种基于Q-学习算法的增量分类模型[J]. 计算机科学, 2020, 47(8): 171-177. |
[13] | 程婧, 刘娜娜, 闵可锐, 康昱, 王新, 周扬帆. 一种低频词词向量优化方法及其在短文本分类中的应用[J]. 计算机科学, 2020, 47(8): 255-260. |
[14] | 王慧, 乐孜纯, 龚轩, 武玉坤, 左浩. 基于特征分类的链路预测方法综述[J]. 计算机科学, 2020, 47(8): 302-312. |
[15] | 刘肖, 袁冠, 张艳梅, 闫秋艳, 王志晓. 基于自适应多分类器融合的手势识别[J]. 计算机科学, 2020, 47(7): 103-110. |
|