Computer Science ›› 2019, Vol. 46 ›› Issue (11): 49-57.doi: 10.11896/jsjkx.181001899

• Surveys • Previous Articles     Next Articles

Weakly Supervised Learning-based Object Detection:A Survey

ZHOU Xiao-long1,2, CHEN Xiao-jia1, CHEN Sheng-yong1, LEI Bang-jun2   

  1. (College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)1
    (Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering (China Three Gorges University),Yichang,Hubei 443002,China)2
  • Received:2018-10-12 Online:2019-11-15 Published:2019-11-14

Abstract: Object detection is one of the fundamental problems in the field of computer vision.Currently,supervised learning-based object detection algorithm is one of the mainstream algorithms for object detection.In the existing researches,high-precision image labels are the precondition of supervised object detection to gain good performance.How-ever,it becomes more and more difficult to gain accurate labels due to the complexity of background and variety of objects in a real scenario.With the development of deep learning,how to receive good performance with the low-cast image labels becomes the key point in this field.This paper mainly introduced object detection algorithms based on weakly supervised learning with image-level labels.Firstly,this paper described the background of object detection and pointed out the shortcomings of training data.Then,it reviewed weakly supervised object detection algorithm based on image-level labels from three aspects:image segmentation,multi-instance learning and convolutional neural network.The multi-instance learning and convolutional neural network were comprehensively illustrated in several ways like saliency learning and collaborative learning.Finally,this paper compared mainstream algorithms based on weakly supervised learning horizontally and compared them with object detection algorithms based on supervised learning.The results prove that weakly supervised object detection algorithm has achieved great progress,especially the convolutional neural network has greatly promoted the development and gradually replaced multi-instance learning.After taking fusion algorithm,its accuracy rate is remarkably increased to 79.3% on Pascal VOC 2007.However,it still performs worse than supervised object detection algorithm.To achieve better performance,the fusion algorithm based on convolutional neural network is becoming a mainstream algorithm in weakly supervised object detection.

Key words: Convolutional neural network, Image segmentation, Multi-instance learning, Object detection, Weakly supervised learning

CLC Number: 

  • TP391.4
[1]ZHOU X,LI Y,HE B,et al.GM-PHD-Based Multi-Target Visual Tracking Using Entropy Distribution and Game Theory [J].IEEE Transactions on Industrial Informatics,2014,10(2):1064-1076.
[2]SHAO Z,LI Y.On Integral Invariants for Effective 3-D Motion Trajectory Matching and Recognition [J].IEEE Transactions on Cybernetics,2016,46(2):511-523.
[3]ZHOU X,CAI H,LI Y,et al.Two-Eye Model-Based Gaze Estimation from A Kinect Sensor[C]∥Proceedings of IEEE International Conference on Robotics and Automation.New York:IEEE Press,2017:1646-1653.
[4]ZHENG J,YANG P,CHEN S,et al.Iterative ReconstrainedGroup Sparse Face Recognition with Adaptive Weights Learning [J].IEEE Transactions on Image Processing,2017,26(5):2408-2423.
[5]VIOLA P,JONES M.Rapid Object Detection Using a Boosted Cascade of Simple Features[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2001:511-518.
[6]DALAL N,TRIGGS B.Histograms of Oriented Gradients for Human Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2014:886-893.
[7]LIAO S,JAIN A,LI S.A Fast and Accurate Unconstrained Face Detector [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(2):211-223.
[8]FELZENSZWALB P,MCALLESTER D,RAMANAN D.ADiscriminatively Trained,Multiscale,Deformable Part Model[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2008:1-8.
[9]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling inDeep Convolutional Networks for Visual Recognition [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916.
[10]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2013:580-587.
[11]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling inDeep Convolutional Networks for Visual Recognition [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916.
[12]GIRSHICK R.Fast R-CNN[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:1440-1448.
[13]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[14]ZHANG H,KYAW Z,YU J,et al.PPR-FCN:Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:4243-4251.
[15]ZHANG L,LIN L,LIANG X,et al.Is Faster R-CNN DoingWell for Pedestrian Detection?[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2016:443-457.
[16]LIN T,DOLLAR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]∥Proceedings of IEEE Confe-rence on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:936-944.
[17]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:779-788.
[18]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2016:21-37.
[19]LIN T,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]∥Proceedings of IEEE International Conference on Computer Vision.New York:IEEE Press,2017:2999-3007.
[20]FELZENSZWALB P,HUTTENLOCHER D.Efficient Graph-Based Image Segmentation [J].International Journey of Computer Vision,2004,59(2):167-181.
[21]ALEXE B,DESELAERS T,FERRARI V.Classcut for Unsupervised Class Segmentation[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2010:380-393.
[22]JOULIN A,BACH F,PONCE J.Discriminative Clustering for Image Co-segmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2010:1943-1950.
[23]VICENTE S,ROTHER C,KOLMOGOROV V.Object Cosegmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2011:2217-2224.
[24]LI J,LI X,YANG B,et al.Segmentation-Based Image CopyMove Forgery Detection Scheme[J].IEEE Transactions on Information Forensics and Security,2017,10(3):507-518.
[25]LIU T,YUAN Z,SUN J,et al.Learning to Detect A Salient Object [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(2):353-367.
[26]YANG M,YANG J.Top-down Visual Saliency Via Joint CRF and Dictionary Learning[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2012:2296-2303.
[27]DESELASERS T.Weakly Supervised Localization and Learning with Generic Knowledge [J].International Journal of Computer Vision,2012,100(3):275-293.
[28]XU J,SCHWING A,URTASUN R.Learning to Segment Under Various Forms of Weak Supervision[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:3781-3790.
[29]ZHOU Z.A Brief Introduction to Weakly Supervised Learning[J].National Science Review,2018,5(1):44-53.
[30]ZHOU Z.Multi-instance Learning From Supervised View [J].Journal of Computer Science and Technology,2006,21(5):800-809.
[31]FELZENSZWALB P,GIRSHICK R,MCALLESTER D,et al.Object Detection with Discriminatively Trained Part-based Mo-dels[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(9):1627-1645.
[32]WEI X,ZHOU Z.An Empirical Study on Image Bag Generators for Multi-instance Learning [J].Kluwer Academic Publishers,2016,105(2):1-44.
[33]WANG C,REN W,HUANG K,et al.Weakly Supervised Object Localization with Latent Category Learning[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2014:431-445.
[34]BILEN H,PEDERSOLI M,TUYTELAARS T.Weakly Supervised Object Detection with Convex Clustering[C]∥Procee-dings of IEEE Conference on Computer Vision and Pattern Re-cognition.New York:IEEE Press,2015:1081-1089.
[35]KUMAR M,PACKER B,KOLLER D.Self-paced Learning for Latent Variable Models[C]∥Proceedings of International Conference on Neural Information Processing Systems.Vancouver:Curran Associates Inc,2010:1189-1197.
[36]DESELAERS T,ALEXE B,FERRARI V.Localizing ObjectsWhile Learning Their Appearance[C]∥Proceedings of EuropeanConference on Computer Vision.Berlin:Springer,2010:452-466.
[37]CINBIS R,VERBEEK J,SCHMID C.Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(1):189-203.
[38]TANG P,WANG X,BAI X,et al.Multiple Instance Detection Network with Online Instance Classifier Refinement[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:3059-3067.
[39]CABRAL R,TORRE F,COSTEIRA J,et al.Matrix Completion for Weakly-supervised Multi-label Image Classification [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(1):121-135.
[40]DALAL N,TRIGGS B.Histograms of Oriented Gradients for Human Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2014:886-893.
[41]VIOLA P,JONES M.Robust Real-time Face Detection [J].Internal Journey of Computer Vision,2004,57(2):137-154.
[42]CHENG M,ZHANG G,MITRA N,et al.Global Contrast Based Salient Region Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2011:409-416.
[43]JIANG H,WANG J,YUAN Z,et al.Salient Object Detection:A Discriminative Regional Feature Integration Approach[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2013:2083-2090.
[44]NAVALPAKKAM V,ITTI L.Modeling the Influence Of Task on Attention [J].Vision Research,2005,45(2):205-231.
[45]BORJI A.Boosting Bottom-up and Top-down Visual Features for Saliency Estimation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2012:438-445.
[46]SHI Z,HOSPEDALES T,XIANG T.Bayesian Joint Modelingfor Object Localization in Weakly Labeled Images [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(10):1959-1972.
[47]ITTI L,KOCH C,NIEBUR E.A Model of Saliency-based VisualAttention for Rapid Analysis [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,20(11):1254-1259.
[48]TSAURO G,TOURCTZKY D,LN T,et al.Advances in Neural Information Processing Systems [J].Morgan Kaufmann Publishers,2009,2(4):368-374.
[49]HOU X,ZHANG L.Saliency Detection:A Spectral ResidualApproach[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2007:1-8.
[50]GOFERMAN S,ZELNIKMANOR L,TAL A.Context-AwareSaliency Detection [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(10):1915-1926.
[51]WEI Y,ZHOU Y,LI H.Spectral-Spatial Response for Hyperspectral Image Classification [J].Remote Sensing,2017,9(3):203-234.
[52]ANDREWS S,TSOCHANTARIDIS I,HOFMANN T.Support Vector Machines for Multiple-instance Learning [J].Advances in Neural Information Processing Systems,2003,15(2):561-568.
[53]CHEN Y,BI J,WANG J.MILES:Multiple-instance Learningvia Embedded Instance Selection [J].IEEE Transactions on Pattern Anlaysis and Machine Intelligence,2006,28(12):1931-1947.
[54]LI Y,KWOK J,TSANG I,et al.A Convex Method for Locating Regions of Interest with Multi-instance Learning[C]∥Procee-dings of European Conference on Machine Learning and Know-ledge Discovery in Databases.Berlin:Springer,2009:15-30.
[55]RUSSAKOVSKY O,LIN Y,YU K,et al.Object-centric Spatial Pooling for Image Classification[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2012:1-15.
[56]ZHOU B,KHOSLA A,LAPEDRIZA A,et al.Learning DeepFeatures for Discriminative Localization[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:2921-2929.
[57]OQUAB M,BOTTOU L,LAPTEV I,et al.Learning andTransferring Mid-level Image Representations Using Convolutional Neural Networks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2014:1717-1724.
[58]OQUAB M,BOTTOU L,LAPTEV I,et al.Is Object Localization for Free? Weakly-supervised Learning with Convolutional Neural Networks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:685-694.
[59]OQUAB M,BOTTOU L,LAPTEV I,et al.Weakly supervised object recognition with Convolutional Neural Networks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2014:1-38.
[60]HONG S,KWAK S,HAN B.Weakly Supervised Learning with Deep Convolutional Neural Networks for Semantic Segmentation:Understanding Semantic Layout of Images with Minimum Human Supervision [J].IEEE Signal Processing Magazine,2017,34(6):39-49.
[61]BILEN H,VEDALDI A.Weakly Supervised Deep DetectionNetworks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:2846-2854.
[62]XU J,LEI B,KIROS R,et al.Show,Attend and Tell:NeuralImage Caption Generation with Visual Attention[C]∥Proceedings of IEEE Conference on Machine Learning.Lille:JMLR org,2015:2048-2057.
[63]ZHANG W,TAN X Y.Weakly-Supervised multi-label-Classification-Based attention mechanism [J].Journey of Data Acquisition and Processing,2018,33(5):801-808.(in Chinese)张文,谭晓阳.基于Attention的弱监督多标号图像分类[J].数据采集与处理,2018,33(5):801-808.
[64]ZHOU M F,WANG X L.Object detection models of remote sensing images using deep neural networks with weakly supervised training methods [J].Science China,2018,48(8):1022-1034.(in Chinese)周明非,汪西莉.弱监督深层神经网络遥感图像目标检测模型[J].中国科学,2018,48(8):1022-1034.
[65]TZENG E,HOFFMAN J,SAENKO K,et al.Adversarial Discriminative Domain Adaptation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:2962-2971.
[66]INOUE N,FURUTA R,YAMASAKI T,et al.Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE Press,2018:5001-5009.
[67]LI D,HUANG J,LI Y,et al.Weakly Supervised Object Localization with Progressive Domain adaptation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:3512-3620.
[68]TZENG E,HOFFMAN J,SAENKO K,et al.Adversarial Discriminative Domain Adaptation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:2962-2971.
[69]KANTOROV V,OQUAB M,CHO M,et al.ContextLocNet:Context-aware Deep Network Models for Weakly Supervised Localization∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2016:350-365.
[70]JIE Z,WEI Y,JIN X,et al.Deep Self-taught Learning for Weakly Supervised Object Localization∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:4294-4302.
[1] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[2] CHEN Yong-quan, JIANG Ying. Analysis Method of APP User Behavior Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(8): 78-85.
[3] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[4] LIU Dong-mei, XU Yang, WU Ze-bin, LIU Qian, SONG Bin, WEI Zhi-hui. Incremental Object Detection Method Based on Border Distance Measurement [J]. Computer Science, 2022, 49(8): 136-142.
[5] WANG Can, LIU Yong-jian, XIE Qing, MA Yan-chun. Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization [J]. Computer Science, 2022, 49(8): 157-164.
[6] DAI Zhao-xia, LI Jin-xin, ZHANG Xiang-dong, XU Xu, MEI Lin, ZHANG Liang. Super-resolution Reconstruction of MRI Based on DNGAN [J]. Computer Science, 2022, 49(7): 113-119.
[7] LIU Yue-hong, NIU Shao-hua, SHEN Xian-hao. Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(7): 127-131.
[8] XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[9] ZHAO Lu, YUAN Li-ming, HAO Kun. Review of Multi-instance Learning Algorithms [J]. Computer Science, 2022, 49(6A): 93-99.
[10] WU Zi-bin, YAN Qiao. Projected Gradient Descent Algorithm with Momentum [J]. Computer Science, 2022, 49(6A): 178-183.
[11] YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352.
[12] YANG Jian-nan, ZHANG Fan. Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure [J]. Computer Science, 2022, 49(6A): 353-357.
[13] ZHANG Jia-hao, LIU Feng, QI Jia-yin. Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer [J]. Computer Science, 2022, 49(6A): 370-377.
[14] WANG Jian-ming, CHEN Xiang-yu, YANG Zi-zhong, SHI Chen-yang, ZHANG Yu-hang, QIAN Zheng-kun. Influence of Different Data Augmentation Methods on Model Recognition Accuracy [J]. Computer Science, 2022, 49(6A): 418-423.
[15] CHEN Yong-ping, ZHU Jian-qing, XIE Yi, WU Han-xiao, ZENG Huan-qiang. Real-time Helmet Detection Algorithm Based on Circumcircle Radius Difference Loss [J]. Computer Science, 2022, 49(6A): 424-428.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!