弱监督学习下的目标检测算法综述

doi:10.11896/jsjkx.181001899

摘要/Abstract

摘要： 目标检测是计算机视觉领域的基本问题之一,基于监督学习的目标检测算法是当前目标检测的主流算法。在现有的研究中,高精度的图像标记是强监督学习目标检测能够获得良好性能的前提。然而,实际场景中背景的复杂性以及目标的多样性等因素,使得图像标注任务非常费时费力。随着深度学习的不断发展,如何通过低成本的图像标注获得良好的训练结果成为当前的研究重点。文中主要综述了基于图像级别标签的弱监督目标检测算法,首先介绍了目标检测的发展历程,主要基于强监督学习对目标检测算法进行了阐述并指出其训练数据的局限性;然后从图像分割、多示例学习以及卷积神经网络3个方面对弱监督目标检测方法进行了分析,从显著性学习、多网络协作学习等角度对多示例学习和卷积神经网络进行了详细的描述;最后通过实验对弱监督学习下的多种主流方法进行了横向比较,并且将其与当前主流的强监督目标检测算法进行了比较。实验结果表明:弱监督学习已经取得了很大的进步,卷积神经网络的应用极大地促进了弱监督目标检测算法的发展,逐步替代了传统的多示例学习方法,尤其是采用了联合算法之后在Pascal VOC 2007上的准确率有了显著提高,达到了79.3%。但是由于其性能依然低于强监督学习下的目标检测算法,因此弱监督目标检测依然有很大的发展空间。基于卷积神经网络的联合算法逐渐成为当前基于弱监督学习的目标检测的主流方法。

关键词: 多示例学习, 卷积神经网络, 目标检测, 弱监督学习, 图像分割

Abstract: Object detection is one of the fundamental problems in the field of computer vision.Currently,supervised learning-based object detection algorithm is one of the mainstream algorithms for object detection.In the existing researches,high-precision image labels are the precondition of supervised object detection to gain good performance.How-ever,it becomes more and more difficult to gain accurate labels due to the complexity of background and variety of objects in a real scenario.With the development of deep learning,how to receive good performance with the low-cast image labels becomes the key point in this field.This paper mainly introduced object detection algorithms based on weakly supervised learning with image-level labels.Firstly,this paper described the background of object detection and pointed out the shortcomings of training data.Then,it reviewed weakly supervised object detection algorithm based on image-level labels from three aspects:image segmentation,multi-instance learning and convolutional neural network.The multi-instance learning and convolutional neural network were comprehensively illustrated in several ways like saliency learning and collaborative learning.Finally,this paper compared mainstream algorithms based on weakly supervised learning horizontally and compared them with object detection algorithms based on supervised learning.The results prove that weakly supervised object detection algorithm has achieved great progress,especially the convolutional neural network has greatly promoted the development and gradually replaced multi-instance learning.After taking fusion algorithm,its accuracy rate is remarkably increased to 79.3% on Pascal VOC 2007.However,it still performs worse than supervised object detection algorithm.To achieve better performance,the fusion algorithm based on convolutional neural network is becoming a mainstream algorithm in weakly supervised object detection.

Key words: Convolutional neural network, Image segmentation, Multi-instance learning, Object detection, Weakly supervised learning

中图分类号:

TP391.4

周小龙, 陈小佳, 陈胜勇, 雷帮军. 弱监督学习下的目标检测算法综述[J]. 计算机科学, 2019, 46(11): 49-57. https://doi.org/10.11896/jsjkx.181001899

ZHOU Xiao-long, CHEN Xiao-jia, CHEN Sheng-yong, LEI Bang-jun. Weakly Supervised Learning-based Object Detection:A Survey[J]. Computer Science, 2019, 46(11): 49-57. https://doi.org/10.11896/jsjkx.181001899

参考文献

[1]ZHOU X,LI Y,HE B,et al.GM-PHD-Based Multi-Target Visual Tracking Using Entropy Distribution and Game Theory [J].IEEE Transactions on Industrial Informatics,2014,10(2):1064-1076.
[2]SHAO Z,LI Y.On Integral Invariants for Effective 3-D Motion Trajectory Matching and Recognition [J].IEEE Transactions on Cybernetics,2016,46(2):511-523.
[3]ZHOU X,CAI H,LI Y,et al.Two-Eye Model-Based Gaze Estimation from A Kinect Sensor[C]∥Proceedings of IEEE International Conference on Robotics and Automation.New York:IEEE Press,2017:1646-1653.
[4]ZHENG J,YANG P,CHEN S,et al.Iterative ReconstrainedGroup Sparse Face Recognition with Adaptive Weights Learning [J].IEEE Transactions on Image Processing,2017,26(5):2408-2423.
[5]VIOLA P,JONES M.Rapid Object Detection Using a Boosted Cascade of Simple Features[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2001:511-518.
[6]DALAL N,TRIGGS B.Histograms of Oriented Gradients for Human Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2014:886-893.
[7]LIAO S,JAIN A,LI S.A Fast and Accurate Unconstrained Face Detector [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(2):211-223.
[8]FELZENSZWALB P,MCALLESTER D,RAMANAN D.ADiscriminatively Trained,Multiscale,Deformable Part Model[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2008:1-8.
[9]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling inDeep Convolutional Networks for Visual Recognition [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916.
[10]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2013:580-587.
[11]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling inDeep Convolutional Networks for Visual Recognition [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916.
[12]GIRSHICK R.Fast R-CNN[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:1440-1448.
[13]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[14]ZHANG H,KYAW Z,YU J,et al.PPR-FCN:Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:4243-4251.
[15]ZHANG L,LIN L,LIANG X,et al.Is Faster R-CNN DoingWell for Pedestrian Detection?[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2016:443-457.
[16]LIN T,DOLLAR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]∥Proceedings of IEEE Confe-rence on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:936-944.
[17]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:779-788.
[18]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2016:21-37.
[19]LIN T,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]∥Proceedings of IEEE International Conference on Computer Vision.New York:IEEE Press,2017:2999-3007.
[20]FELZENSZWALB P,HUTTENLOCHER D.Efficient Graph-Based Image Segmentation [J].International Journey of Computer Vision,2004,59(2):167-181.
[21]ALEXE B,DESELAERS T,FERRARI V.Classcut for Unsupervised Class Segmentation[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2010:380-393.
[22]JOULIN A,BACH F,PONCE J.Discriminative Clustering for Image Co-segmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2010:1943-1950.
[23]VICENTE S,ROTHER C,KOLMOGOROV V.Object Cosegmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2011:2217-2224.
[24]LI J,LI X,YANG B,et al.Segmentation-Based Image CopyMove Forgery Detection Scheme[J].IEEE Transactions on Information Forensics and Security,2017,10(3):507-518.
[25]LIU T,YUAN Z,SUN J,et al.Learning to Detect A Salient Object [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(2):353-367.
[26]YANG M,YANG J.Top-down Visual Saliency Via Joint CRF and Dictionary Learning[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2012:2296-2303.
[27]DESELASERS T.Weakly Supervised Localization and Learning with Generic Knowledge [J].International Journal of Computer Vision,2012,100(3):275-293.
[28]XU J,SCHWING A,URTASUN R.Learning to Segment Under Various Forms of Weak Supervision[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:3781-3790.
[29]ZHOU Z.A Brief Introduction to Weakly Supervised Learning[J].National Science Review,2018,5(1):44-53.
[30]ZHOU Z.Multi-instance Learning From Supervised View [J].Journal of Computer Science and Technology,2006,21(5):800-809.
[31]FELZENSZWALB P,GIRSHICK R,MCALLESTER D,et al.Object Detection with Discriminatively Trained Part-based Mo-dels[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(9):1627-1645.
[32]WEI X,ZHOU Z.An Empirical Study on Image Bag Generators for Multi-instance Learning [J].Kluwer Academic Publishers,2016,105(2):1-44.
[33]WANG C,REN W,HUANG K,et al.Weakly Supervised Object Localization with Latent Category Learning[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2014:431-445.
[34]BILEN H,PEDERSOLI M,TUYTELAARS T.Weakly Supervised Object Detection with Convex Clustering[C]∥Procee-dings of IEEE Conference on Computer Vision and Pattern Re-cognition.New York:IEEE Press,2015:1081-1089.
[35]KUMAR M,PACKER B,KOLLER D.Self-paced Learning for Latent Variable Models[C]∥Proceedings of International Conference on Neural Information Processing Systems.Vancouver:Curran Associates Inc,2010:1189-1197.
[36]DESELAERS T,ALEXE B,FERRARI V.Localizing ObjectsWhile Learning Their Appearance[C]∥Proceedings of EuropeanConference on Computer Vision.Berlin:Springer,2010:452-466.
[37]CINBIS R,VERBEEK J,SCHMID C.Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(1):189-203.
[38]TANG P,WANG X,BAI X,et al.Multiple Instance Detection Network with Online Instance Classifier Refinement[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:3059-3067.
[39]CABRAL R,TORRE F,COSTEIRA J,et al.Matrix Completion for Weakly-supervised Multi-label Image Classification [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(1):121-135.
[40]DALAL N,TRIGGS B.Histograms of Oriented Gradients for Human Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2014:886-893.
[41]VIOLA P,JONES M.Robust Real-time Face Detection [J].Internal Journey of Computer Vision,2004,57(2):137-154.
[42]CHENG M,ZHANG G,MITRA N,et al.Global Contrast Based Salient Region Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2011:409-416.
[43]JIANG H,WANG J,YUAN Z,et al.Salient Object Detection:A Discriminative Regional Feature Integration Approach[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2013:2083-2090.
[44]NAVALPAKKAM V,ITTI L.Modeling the Influence Of Task on Attention [J].Vision Research,2005,45(2):205-231.
[45]BORJI A.Boosting Bottom-up and Top-down Visual Features for Saliency Estimation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2012:438-445.
[46]SHI Z,HOSPEDALES T,XIANG T.Bayesian Joint Modelingfor Object Localization in Weakly Labeled Images [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(10):1959-1972.
[47]ITTI L,KOCH C,NIEBUR E.A Model of Saliency-based VisualAttention for Rapid Analysis [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,20(11):1254-1259.
[48]TSAURO G,TOURCTZKY D,LN T,et al.Advances in Neural Information Processing Systems [J].Morgan Kaufmann Publishers,2009,2(4):368-374.
[49]HOU X,ZHANG L.Saliency Detection:A Spectral ResidualApproach[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2007:1-8.
[50]GOFERMAN S,ZELNIKMANOR L,TAL A.Context-AwareSaliency Detection [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(10):1915-1926.
[51]WEI Y,ZHOU Y,LI H.Spectral-Spatial Response for Hyperspectral Image Classification [J].Remote Sensing,2017,9(3):203-234.
[52]ANDREWS S,TSOCHANTARIDIS I,HOFMANN T.Support Vector Machines for Multiple-instance Learning [J].Advances in Neural Information Processing Systems,2003,15(2):561-568.
[53]CHEN Y,BI J,WANG J.MILES:Multiple-instance Learningvia Embedded Instance Selection [J].IEEE Transactions on Pattern Anlaysis and Machine Intelligence,2006,28(12):1931-1947.
[54]LI Y,KWOK J,TSANG I,et al.A Convex Method for Locating Regions of Interest with Multi-instance Learning[C]∥Procee-dings of European Conference on Machine Learning and Know-ledge Discovery in Databases.Berlin:Springer,2009:15-30.
[55]RUSSAKOVSKY O,LIN Y,YU K,et al.Object-centric Spatial Pooling for Image Classification[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2012:1-15.
[56]ZHOU B,KHOSLA A,LAPEDRIZA A,et al.Learning DeepFeatures for Discriminative Localization[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:2921-2929.
[57]OQUAB M,BOTTOU L,LAPTEV I,et al.Learning andTransferring Mid-level Image Representations Using Convolutional Neural Networks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2014:1717-1724.
[58]OQUAB M,BOTTOU L,LAPTEV I,et al.Is Object Localization for Free? Weakly-supervised Learning with Convolutional Neural Networks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:685-694.
[59]OQUAB M,BOTTOU L,LAPTEV I,et al.Weakly supervised object recognition with Convolutional Neural Networks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2014:1-38.
[60]HONG S,KWAK S,HAN B.Weakly Supervised Learning with Deep Convolutional Neural Networks for Semantic Segmentation:Understanding Semantic Layout of Images with Minimum Human Supervision [J].IEEE Signal Processing Magazine,2017,34(6):39-49.
[61]BILEN H,VEDALDI A.Weakly Supervised Deep DetectionNetworks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:2846-2854.
[62]XU J,LEI B,KIROS R,et al.Show,Attend and Tell:NeuralImage Caption Generation with Visual Attention[C]∥Proceedings of IEEE Conference on Machine Learning.Lille:JMLR org,2015:2048-2057.
[63]ZHANG W,TAN X Y.Weakly-Supervised multi-label-Classification-Based attention mechanism [J].Journey of Data Acquisition and Processing,2018,33(5):801-808.(in Chinese)张文,谭晓阳.基于Attention的弱监督多标号图像分类[J].数据采集与处理,2018,33(5):801-808.
[64]ZHOU M F,WANG X L.Object detection models of remote sensing images using deep neural networks with weakly supervised training methods [J].Science China,2018,48(8):1022-1034.(in Chinese)周明非,汪西莉.弱监督深层神经网络遥感图像目标检测模型[J].中国科学,2018,48(8):1022-1034.
[65]TZENG E,HOFFMAN J,SAENKO K,et al.Adversarial Discriminative Domain Adaptation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:2962-2971.
[66]INOUE N,FURUTA R,YAMASAKI T,et al.Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE Press,2018:5001-5009.
[67]LI D,HUANG J,LI Y,et al.Weakly Supervised Object Localization with Progressive Domain adaptation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:3512-3620.
[68]TZENG E,HOFFMAN J,SAENKO K,et al.Adversarial Discriminative Domain Adaptation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:2962-2971.
[69]KANTOROV V,OQUAB M,CHO M,et al.ContextLocNet:Context-aware Deep Network Models for Weakly Supervised Localization∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2016:350-365.
[70]JIE Z,WEI Y,JIN X,et al.Deep Self-taught Learning for Weakly Supervised Object Localization∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:4294-4302.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed