基于镜头分割与空域注意力模型的视频广告分类方法

doi:10.11896/j.issn.1002-137X.2019.03.019

Abstract

Abstract: As video advertisement is increasingly used in some areas such as search and user recommendation,advertisement video classification becomes an important issue and poses a significant challenge for computer vision.Different from the existing video classification task,there are two challenges of advertisement video classification.First,advertised products appear in advertisement video aperiodically and sparsely.This means that most of frames are irrelevant to advertisement category,which can potentially cause interference with classification models.Second,there are complex background in advertisement video which makes it hard to extract useful information of product.To solve these problems,this paper proposed an advertisement video classification method based on shot segmentation and spatial attention model (SSSA).To address interference of irrelevant frames,a shot based partitioning method was used to sample frames.To solve the influence of complex background on feature extraction,the attention mechanism was embedded into SSSA to locate products and extract discriminative feature from the attention area which is mostly related to the advertised products.An attention predictionnetwork (APN) was trained to predict the attention map.To verify the proposed model,this paper introduced a new thousand-level dataset for advertisement video classification named TAV,and the gaze data were also collected to train the APN.Experiments evaluated on the TAV dataset demonstrate that the performance of the proposed model improves about 10％ compared with the state-of-the-art video classification methods.

Key words: Annotation, Attention, Classification, Video advertisement

CLC Number:

TP391.9

TAN Kai, WU Qing-bo, MENG Fan-man, XU Lin-feng. Video Advertisement Classification Method Based on Shot Segmentation and Spatial Attention Model[J].Computer Science, 2019, 46(3): 131-136.

References

[1]WU Q,LI H,WANG Z,et al.Blind image quality assessment based on rank-order regularized regression.IEEE Transactions on Multimedia,2017,19(11):2490-2504.
[2]MENG F,LI H,WU Q,et al.Seeds-based part segmentation by seeds propagation and region convexity decomposition.IEEE Transactions on Multimedia,2018,20(2):310-322.
[3]WU Q,LI H,NGAN K N,et al.Blind image quality assessment using local consistency aware retriever and uncertainty aware evaluator.IEEE Transactions on Circuits and Systems for Video Technology,2018,28(9):2078-2089.
[4]TAN K,XU L,LIU Y,et al.Small group detection in crowds
using interaction information.IEICE Transactions on Information and Systems,2017,100(7):1542-1545.
[5]WU Q,LI H,MENG F,et al.A perceptually weighted rank correlation indicator for objective image quality assessment.IEEE Transactions on Image Processing,2018,27(5):2499-2513.
[6]MENG F,CAI J F,LI H.Cosegmentation of multiple image
groups.Computer Vision and Image Understanding,2016,146:67-76.
[7]WU Q,LI H,MENG F,et al.Blind image quality assessment based on multichannel feature fusion and label transfer.IEEE Transactions on Circuits and Systems for Video Technology,2016,26(3):425-440.
[8]HU W,HU R,XIE N,et al.Image classification using multiscale
information fusion based on saliency driven nonlinear diffusion filtering.IEEE Transactions on Image Processing,2014,23(4):1513-1526.
[9]ISCEN A,TOLIAS G,GOSSELINP H,et al.A comparison of dense region detectors for image search and fine-grained classification.IEEE Transactions on Image Processing,2015,24(8):2369-2381.
[10]XIAO T,XU Y,YANG K,et al.The application of two-level attention models in deep convolutional neural network for fine-grained image classification∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:842-850.
[11]SIMONYAN K,ZISSERMAN A.Two-stream convolutional
networks for action recognition in videos∥Advances in Neural Information Processing Systems.2014:568-576.
[12]TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3d convolutional networks∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2015:4489-4497.
[13]DONAHUE J,HENDRICKS L A,GUADARRAMA S,et al.Long-
term recurrent convolutional networks for visual recognition and description∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:2625-2634.
[14]DAVE A,RUSSAKOVSKY O,RAMANAN D.Predictive-cor-
rective networks for action detection∥Proceedings of the Computer Vision and Pattern Recognition.IEEE,2017.
[15]JHUANG H,GALL J,ZUFFI S,et al.Towards understanding action recognition∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2013:3192-3199.
[16]CHRON G,IVAN L,et al.P-CNN:Pose-based CNN features for action recognition∥Proceedings of the IEEE International Conference on Computer Vision.IEEE,2015:3218-3226.
[17]KARPATHY A,TODERICI G,SHETTY S,et al.Large-scale
video classification with convolutional neural networks∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:1725-1732.
[18]NG J Y H,HAUSKNECHT M J,VIJAYANARASIMHAN S,et al.Beyond short snippets:Deep networks for video classification∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:4694-4702.
[19]MENG F,LI H,WU Q,et al.Weakly supervised part proposal segmentation from multiple images.IEEE Trans.Image Processing,2017,26(8):4019-4031.
[20]MENG F,LI H,WU Q,et al.Globally measuring the similarity of superpixels by binary edge maps for superpixel clustering.IEEE Transactions on Circuits and Systems for Video Technology,2018,28(4):906-919.
[21]MENG F,LI H,LIU G,et al.Object co-segmentation based on shortest path algorithm and saliency model.IEEE Transactions on Multimedia,2012,14(5):1429-1441.
[22]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:3431-3440.
[23] FEICHTENHOFER C,PINZ A,ZISSERMAN A.Convolutional two-stream network fusion for video action recognition∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:1933-1941.

Related Articles 15

[1]	CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2]	ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[3]	ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[4]	DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[5]	ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[6]	XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182.
[7]	RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[8]	WU Zi-yi, LI Shao-mei, JIANG Meng-han, ZHANG Jian-peng. Ontology Alignment Method Based on Self-attention [J]. Computer Science, 2022, 49(9): 215-220.
[9]	WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[10]	LI Rong-fan, ZHONG Ting, WU Jin, ZHOU Fan, KUANG Ping. Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation [J]. Computer Science, 2022, 49(8): 33-39.
[11]	WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[12]	SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[13]	HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[14]	JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[15]	FANG Yi-qiu, ZHANG Zhen-kun, GE Jun-wei. Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning [J]. Computer Science, 2022, 49(8): 70-77.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Video Advertisement Classification Method Based on Shot Segmentation and Spatial Attention Model

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0