计算机科学 ›› 2019, Vol. 46 ›› Issue (11): 49-57.doi: 10.11896/jsjkx.181001899
周小龙1,2, 陈小佳1, 陈胜勇1, 雷帮军2
ZHOU Xiao-long1,2, CHEN Xiao-jia1, CHEN Sheng-yong1, LEI Bang-jun2
摘要: 目标检测是计算机视觉领域的基本问题之一,基于监督学习的目标检测算法是当前目标检测的主流算法。在现有的研究中,高精度的图像标记是强监督学习目标检测能够获得良好性能的前提。然而,实际场景中背景的复杂性以及目标的多样性等因素,使得图像标注任务非常费时费力。随着深度学习的不断发展,如何通过低成本的图像标注获得良好的训练结果成为当前的研究重点。文中主要综述了基于图像级别标签的弱监督目标检测算法,首先介绍了目标检测的发展历程,主要基于强监督学习对目标检测算法进行了阐述并指出其训练数据的局限性;然后从图像分割、多示例学习以及卷积神经网络3个方面对弱监督目标检测方法进行了分析,从显著性学习、多网络协作学习等角度对多示例学习和卷积神经网络进行了详细的描述;最后通过实验对弱监督学习下的多种主流方法进行了横向比较,并且将其与当前主流的强监督目标检测算法进行了比较。实验结果表明:弱监督学习已经取得了很大的进步,卷积神经网络的应用极大地促进了弱监督目标检测算法的发展,逐步替代了传统的多示例学习方法,尤其是采用了联合算法之后在Pascal VOC 2007上的准确率有了显著提高,达到了79.3%。但是由于其性能依然低于强监督学习下的目标检测算法,因此弱监督目标检测依然有很大的发展空间。基于卷积神经网络的联合算法逐渐成为当前基于弱监督学习的目标检测的主流方法。
中图分类号:
[1]ZHOU X,LI Y,HE B,et al.GM-PHD-Based Multi-Target Visual Tracking Using Entropy Distribution and Game Theory [J].IEEE Transactions on Industrial Informatics,2014,10(2):1064-1076. [2]SHAO Z,LI Y.On Integral Invariants for Effective 3-D Motion Trajectory Matching and Recognition [J].IEEE Transactions on Cybernetics,2016,46(2):511-523. [3]ZHOU X,CAI H,LI Y,et al.Two-Eye Model-Based Gaze Estimation from A Kinect Sensor[C]∥Proceedings of IEEE International Conference on Robotics and Automation.New York:IEEE Press,2017:1646-1653. [4]ZHENG J,YANG P,CHEN S,et al.Iterative ReconstrainedGroup Sparse Face Recognition with Adaptive Weights Learning [J].IEEE Transactions on Image Processing,2017,26(5):2408-2423. [5]VIOLA P,JONES M.Rapid Object Detection Using a Boosted Cascade of Simple Features[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2001:511-518. [6]DALAL N,TRIGGS B.Histograms of Oriented Gradients for Human Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2014:886-893. [7]LIAO S,JAIN A,LI S.A Fast and Accurate Unconstrained Face Detector [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(2):211-223. [8]FELZENSZWALB P,MCALLESTER D,RAMANAN D.ADiscriminatively Trained,Multiscale,Deformable Part Model[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2008:1-8. [9]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling inDeep Convolutional Networks for Visual Recognition [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916. [10]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2013:580-587. [11]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling inDeep Convolutional Networks for Visual Recognition [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916. [12]GIRSHICK R.Fast R-CNN[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:1440-1448. [13]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149. [14]ZHANG H,KYAW Z,YU J,et al.PPR-FCN:Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:4243-4251. [15]ZHANG L,LIN L,LIANG X,et al.Is Faster R-CNN DoingWell for Pedestrian Detection?[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2016:443-457. [16]LIN T,DOLLAR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]∥Proceedings of IEEE Confe-rence on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:936-944. [17]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:779-788. [18]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2016:21-37. [19]LIN T,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]∥Proceedings of IEEE International Conference on Computer Vision.New York:IEEE Press,2017:2999-3007. [20]FELZENSZWALB P,HUTTENLOCHER D.Efficient Graph-Based Image Segmentation [J].International Journey of Computer Vision,2004,59(2):167-181. [21]ALEXE B,DESELAERS T,FERRARI V.Classcut for Unsupervised Class Segmentation[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2010:380-393. [22]JOULIN A,BACH F,PONCE J.Discriminative Clustering for Image Co-segmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2010:1943-1950. [23]VICENTE S,ROTHER C,KOLMOGOROV V.Object Cosegmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2011:2217-2224. [24]LI J,LI X,YANG B,et al.Segmentation-Based Image CopyMove Forgery Detection Scheme[J].IEEE Transactions on Information Forensics and Security,2017,10(3):507-518. [25]LIU T,YUAN Z,SUN J,et al.Learning to Detect A Salient Object [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(2):353-367. [26]YANG M,YANG J.Top-down Visual Saliency Via Joint CRF and Dictionary Learning[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2012:2296-2303. [27]DESELASERS T.Weakly Supervised Localization and Learning with Generic Knowledge [J].International Journal of Computer Vision,2012,100(3):275-293. [28]XU J,SCHWING A,URTASUN R.Learning to Segment Under Various Forms of Weak Supervision[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:3781-3790. [29]ZHOU Z.A Brief Introduction to Weakly Supervised Learning[J].National Science Review,2018,5(1):44-53. [30]ZHOU Z.Multi-instance Learning From Supervised View [J].Journal of Computer Science and Technology,2006,21(5):800-809. [31]FELZENSZWALB P,GIRSHICK R,MCALLESTER D,et al.Object Detection with Discriminatively Trained Part-based Mo-dels[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(9):1627-1645. [32]WEI X,ZHOU Z.An Empirical Study on Image Bag Generators for Multi-instance Learning [J].Kluwer Academic Publishers,2016,105(2):1-44. [33]WANG C,REN W,HUANG K,et al.Weakly Supervised Object Localization with Latent Category Learning[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2014:431-445. [34]BILEN H,PEDERSOLI M,TUYTELAARS T.Weakly Supervised Object Detection with Convex Clustering[C]∥Procee-dings of IEEE Conference on Computer Vision and Pattern Re-cognition.New York:IEEE Press,2015:1081-1089. [35]KUMAR M,PACKER B,KOLLER D.Self-paced Learning for Latent Variable Models[C]∥Proceedings of International Conference on Neural Information Processing Systems.Vancouver:Curran Associates Inc,2010:1189-1197. [36]DESELAERS T,ALEXE B,FERRARI V.Localizing ObjectsWhile Learning Their Appearance[C]∥Proceedings of EuropeanConference on Computer Vision.Berlin:Springer,2010:452-466. [37]CINBIS R,VERBEEK J,SCHMID C.Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(1):189-203. [38]TANG P,WANG X,BAI X,et al.Multiple Instance Detection Network with Online Instance Classifier Refinement[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:3059-3067. [39]CABRAL R,TORRE F,COSTEIRA J,et al.Matrix Completion for Weakly-supervised Multi-label Image Classification [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(1):121-135. [40]DALAL N,TRIGGS B.Histograms of Oriented Gradients for Human Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2014:886-893. [41]VIOLA P,JONES M.Robust Real-time Face Detection [J].Internal Journey of Computer Vision,2004,57(2):137-154. [42]CHENG M,ZHANG G,MITRA N,et al.Global Contrast Based Salient Region Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2011:409-416. [43]JIANG H,WANG J,YUAN Z,et al.Salient Object Detection:A Discriminative Regional Feature Integration Approach[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2013:2083-2090. [44]NAVALPAKKAM V,ITTI L.Modeling the Influence Of Task on Attention [J].Vision Research,2005,45(2):205-231. [45]BORJI A.Boosting Bottom-up and Top-down Visual Features for Saliency Estimation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2012:438-445. [46]SHI Z,HOSPEDALES T,XIANG T.Bayesian Joint Modelingfor Object Localization in Weakly Labeled Images [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(10):1959-1972. [47]ITTI L,KOCH C,NIEBUR E.A Model of Saliency-based VisualAttention for Rapid Analysis [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,20(11):1254-1259. [48]TSAURO G,TOURCTZKY D,LN T,et al.Advances in Neural Information Processing Systems [J].Morgan Kaufmann Publishers,2009,2(4):368-374. [49]HOU X,ZHANG L.Saliency Detection:A Spectral ResidualApproach[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2007:1-8. [50]GOFERMAN S,ZELNIKMANOR L,TAL A.Context-AwareSaliency Detection [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(10):1915-1926. [51]WEI Y,ZHOU Y,LI H.Spectral-Spatial Response for Hyperspectral Image Classification [J].Remote Sensing,2017,9(3):203-234. [52]ANDREWS S,TSOCHANTARIDIS I,HOFMANN T.Support Vector Machines for Multiple-instance Learning [J].Advances in Neural Information Processing Systems,2003,15(2):561-568. [53]CHEN Y,BI J,WANG J.MILES:Multiple-instance Learningvia Embedded Instance Selection [J].IEEE Transactions on Pattern Anlaysis and Machine Intelligence,2006,28(12):1931-1947. [54]LI Y,KWOK J,TSANG I,et al.A Convex Method for Locating Regions of Interest with Multi-instance Learning[C]∥Procee-dings of European Conference on Machine Learning and Know-ledge Discovery in Databases.Berlin:Springer,2009:15-30. [55]RUSSAKOVSKY O,LIN Y,YU K,et al.Object-centric Spatial Pooling for Image Classification[C]∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2012:1-15. [56]ZHOU B,KHOSLA A,LAPEDRIZA A,et al.Learning DeepFeatures for Discriminative Localization[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:2921-2929. [57]OQUAB M,BOTTOU L,LAPTEV I,et al.Learning andTransferring Mid-level Image Representations Using Convolutional Neural Networks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2014:1717-1724. [58]OQUAB M,BOTTOU L,LAPTEV I,et al.Is Object Localization for Free? Weakly-supervised Learning with Convolutional Neural Networks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:685-694. [59]OQUAB M,BOTTOU L,LAPTEV I,et al.Weakly supervised object recognition with Convolutional Neural Networks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2014:1-38. [60]HONG S,KWAK S,HAN B.Weakly Supervised Learning with Deep Convolutional Neural Networks for Semantic Segmentation:Understanding Semantic Layout of Images with Minimum Human Supervision [J].IEEE Signal Processing Magazine,2017,34(6):39-49. [61]BILEN H,VEDALDI A.Weakly Supervised Deep DetectionNetworks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:2846-2854. [62]XU J,LEI B,KIROS R,et al.Show,Attend and Tell:NeuralImage Caption Generation with Visual Attention[C]∥Proceedings of IEEE Conference on Machine Learning.Lille:JMLR org,2015:2048-2057. [63]ZHANG W,TAN X Y.Weakly-Supervised multi-label-Classification-Based attention mechanism [J].Journey of Data Acquisition and Processing,2018,33(5):801-808.(in Chinese)张文,谭晓阳.基于Attention的弱监督多标号图像分类[J].数据采集与处理,2018,33(5):801-808. [64]ZHOU M F,WANG X L.Object detection models of remote sensing images using deep neural networks with weakly supervised training methods [J].Science China,2018,48(8):1022-1034.(in Chinese)周明非,汪西莉.弱监督深层神经网络遥感图像目标检测模型[J].中国科学,2018,48(8):1022-1034. [65]TZENG E,HOFFMAN J,SAENKO K,et al.Adversarial Discriminative Domain Adaptation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:2962-2971. [66]INOUE N,FURUTA R,YAMASAKI T,et al.Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE Press,2018:5001-5009. [67]LI D,HUANG J,LI Y,et al.Weakly Supervised Object Localization with Progressive Domain adaptation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:3512-3620. [68]TZENG E,HOFFMAN J,SAENKO K,et al.Adversarial Discriminative Domain Adaptation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:2962-2971. [69]KANTOROV V,OQUAB M,CHO M,et al.ContextLocNet:Context-aware Deep Network Models for Weakly Supervised Localization∥Proceedings of European Conference on Computer Vision.Berlin:Springer,2016:350-365. [70]JIE Z,WEI Y,JIN X,et al.Deep Self-taught Learning for Weakly Supervised Object Localization∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:4294-4302. |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[3] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[4] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[5] | 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉. 基于边框距离度量的增量目标检测方法 Incremental Object Detection Method Based on Border Distance Measurement 计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132 |
[6] | 王灿, 刘永坚, 解庆, 马艳春. 基于软标签和样本权重优化的Anchor Free目标检测算法 Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization 计算机科学, 2022, 49(8): 157-164. https://doi.org/10.11896/jsjkx.210600240 |
[7] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[8] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[9] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[10] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[11] | 刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179 |
[12] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
[13] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
[14] | 赵璐, 袁立明, 郝琨. 多示例学习算法综述 Review of Multi-instance Learning Algorithms 计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047 |
[15] | 吴子斌, 闫巧. 基于动量的映射式梯度下降算法 Projected Gradient Descent Algorithm with Momentum 计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039 |
|