基于BoF模型的图像表示方法研究

摘要/Abstract

摘要： 设计合适的图像表示是计算机视觉中最重要的问题之一。BoF特征表示方法非常流行,已经广泛应用于图像分类、对象识别、图像检索、机器人定位和纹理识别。BoF特征是将图像表示为无序的特征集合。这种方法虽然缺乏结构信息和空间信息,但概念简洁、计算简单,在某些应用上取得的效果甚至可以与当前最好的方法媲美。仔细研究了BoF模型,着重对BoF模型中的3个阶段:局部特征提取、特征量化和编码、特征汇集所涉及到的典型技术进行了讨论。最后在分析各类研究方法的基础上,总结了目前研究存在的问题及可能的发展方向。

关键词: 特征包,局部特征,特征量化,特征汇集,计算机视觉中图法分类号TP317．4文献标识码A

Abstract: Designing a suitable image representation is one of the most fundamental issues of computer vision．BoF mo-del is very popular and used extensively in image classification,video search,robot localization and texture recognition．BoF feature is an orderless collection of quantized local image descriptors．While this feature representation discards structural and spatial information,BoF model is conceptually and computationally simple,even as good as stateof- the-art methods．Three steps in the popular BoF were studied in detail,including feature extraction,feature coding and feature pooling.In the end,the main problems and challenges were highlighted based on analysis of current research technique．

Key words: BoF,Local features,Feature quantization,Feature pooling,Computer vision

梁晔,于剑,刘宏哲. 基于BoF模型的图像表示方法研究[J]. 计算机科学, 2014, 41(2): 36-44. https://doi.org/

LIANG Ye,YU Jian and LIU Hong-zhe. Study of BoF Model Based Image Representation[J]. Computer Science, 2014, 41(2): 36-44. https://doi.org/

参考文献

[1] Csurka G,Dance C R,Fan Li-xin,et al．Visual categorizationwith bags of keypoints[C]∥Proceedings of European Confe-rence Computer Vision 2004,workshop on Statistical Learning in Computer Vision,2004．Prague,Czech Republic:Springer-Verlag LNCS,2004:59-74
[2] MacQueen J B．Some Methods for classification and Analysis of Multivariate Observations[C]∥Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability,1967．Berkeley,University of California Press,1967,1:281-297
[3] Arai K,Barakbah A R．Hierarchical K-means:an algorithm for centroids initialization for K-means[J]．Reports of the Faculty of Science and Engineering,2007,36(1):25-31
[4] McLachlan G J,Basford K E．Mixture Models:Inference andApplications to Clustering [M]．New York:Marcel Dekker,1988
[5] Comaniciu D,Meer P．Mean Shift:A Robust Approach toward Feature Space Analysis[J]．IEEE Transactions on Pattern Ana-lysis and Machine Intelligence,2002,24(5):603-619
[6] Beaudet P R．Rotationally invariant image operators[C]∥Proceedings of the 4th International Joint Conference on Pattern Recognition,1978．Kyoto,Japan:Institute of Electrical and Electronics Engineers Inc.,1978:579-583
[7] Hams C,Stephens M．A combined corner and edge detector[C]∥Proceedings of Alvey Vision Conference,1988．University of Manchester,1988:147-151
[8] Smith S M,Brady J M．SUSA N:A new approach to low levelimage processing[J]．International Journal of Computer Vision,1997,23(1):45-78
[9] Moravec H．Towards automatic visual obstacle avoidance[C]∥Proceedings of the International Joint Conference on Artificial Intelligence,1977.Cambridge,Massachusetts,USA:Massachusetts Institute of Technology,1977:584
[10] Johnson A,Hebert M．Object recognition by matching oriented points[C]∥Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition,1997．San Juan,Puerto Rico:IEEE Computer Society,1997:684-689
[11] Mikolajczyk K,Schmid C．A Performance Evaluation of LocalDescriptors[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(10):1615-1630
[12] Lindeberg T．Feature Detection with Automatic Scale Selection[J].International Journal of Computer Vision,1998,30(2):79-116
[13] Tuytelaars T,Van Gool L．Matching widely separated viewsbased on affine invariant regions[J]．International Journal of Computer Vision,2004,59(1):61-85
[14] Lowe D G.Object recognition from local scale invariant features[C]∥Proceedings of the 7^th International Conference on Computer Vision,1999．Kerkyra,Greece:IEEE Computer Society,1999:1150-1157
[15] Mikolajczyk K,Schmid C．Scale&Affine Invariant Interest Point Detectors[J]．International Journal of Computer Vision,2004,60(1):63-86
[16] Frintrop S,Rome E,Christensen H I.Computational visual at-tention systems and their cognitive foundations:A survey[J]．ACM Transactions on Applied Perception (TAP) ,2010,7(1):1-39
[17] Itti L,Koch C.A saliency-based search mechanism for overt and covert shifts of visual attention[J]．Vision Research,2000,40(10-12):1489-1506
[18] Matas J,Chum O,Urban M,et al.Robust Wide Baseline Stereo From Maximally Stable Extremal Regions[C]∥Proceedings of British Machine Vision Conference,2002．British:the British Machine Vision Association,2002:384-393
[19] Kadir T,Zisserman A,Brady M．An Affine Invariant Salient Region Detector[C]∥Proceedings of European Conference on Computer Vision,2004．LNCS,2004,2l:228-241
[20] Mikolajczyk K,Tuytelaars T,Schmid C,et al.A Comparison of Affine Region Detectors[J].International Journal of Computer Vision,2005,65(1/2):43-72
[21] Koch C,Ullman S．Shifts in selective visual attention:towards the underlying neural circuitry[J]．Human Neurobiology,1985,4(4):219-27
[22] Lindeberg T．Detecting salient blob-like image structures andtheir scales with a scale-space primal sketch:a method for focus-of-attention[J]．International Journal of Computer Vision,1993,11(3):283-318
[23] Lowe D．Distinctive Image Features from Scale-invariant Key-points[J]．International Journal of Computer Vision,2004,0(2):91-110
[24] Ke Y,Sukthankar R．Pca-sift:A More Distinctive Representation for Local Image Descriptors[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2004．Washington,DC:IEEE Computer Society,2004:506-513
[25] Dalal N,Triggs B．Histograms of Oriented Gradients for Human Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2005．San Diego,CA,USA:IEEE Computer Society,2005:886-893
[26] Belongie S,Malik J,Puzicha J．shape matchjng and object recognition using shape contexts[J]．IEEE Tran8actions on Pattem Analysis and Machine Intelligence,2002,24(4):509-522
[27] Jiang Y G,Ngo C W,Yang J．Towards optimal bag-of-features for object categorization and semantic video retrieval[C]∥Proceedings of ACM Conference on Image and Video Retrieval,2007．New York,NY,USA:ACM,2007:494-501
[28] van de Sande K E A,Gevers T, Snoek C G M．Evaluating color descriptors for object and scene recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(9):1582-1596
[29] Freeman W T,Adelson E H.The Design and Use of Steerable Filters[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1991,3(9):891-906
[30] Baumberg A．Reliable feature matching across widely separated views[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2000．Hilton Head,SC,USA:IEEE Computer Society,2000:774-781
[31] Schaffalitzky F,Zisserman A．Multi-view Matching for Unordered Image Sets[C]∥Proceedings of 4th European Conference on Computer Vision,2002．Copenhagen,Denmark:Springer,2002:414-431
[32] Ojala T,Pietikinen M,Menp T．Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,4(7):971-987
[33] Hadid A．Face Description with Local Binary Patterns:Application to Face Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,8(12):2037-2041
[34] Song Dong-jin,Tao Da-cheng．Biologically Inspired FeatureManifold for Scene Classification[J]．IEEE Transactions on Ima-ge Processing,2010,19(1):174-184
[35] Harada T,Nakayama H,Kuniyoshi Y.Improving Local Descriptors by Embendding Global and Local Spatial Information[C]∥Proceedings of European Conference on Computer Vision,2010．Heraklion,Crete,Greece,2010:736-749
[36] Karlinsky L,Dinerstein M,Ullman S.Unsupervised Feature Optimization (UFO):Simultaneous Selection of Multiple Features with Their Detection Parameters[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2009.Miami,Florida,USA:IEEE Computer Society,2009:1263-1270
[37] Winder S,Brown M．Learning Local Image Descriptors[C]∥Proceedings IEEE Conference on Computer Vision and Pattern Recognition,2007．Minneapolis,Minnesota,USA:IEEE Computer Society,2007:1-8
[38] Winder S,Hua G,Brown M．Picking the best DAISY[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2009．Miami,Florida,USA:IEEE Computer Society,2009:178-185
[39] Coates A,Ng A Y．The Importance of Encoding Versus Trai-ning with Sparse Coding and Vector Quantization[C]∥Procee-dings of the 28^th International Conference on Machine Learning 2011．Bellevue,WA,USA,2011
[40] Rigamonti R,Brown M A,Lepetit V．Are Sparse Rrepresenta-tions Really Relevant for Image Classification?[C]∥Procee-dings of IEEE Conference on Computer Vision and Pattern Recognition,2011．Colorado Springs,CO,USA:IEEE Computer Society,2011:1545-1552
[41] Sivic J,Zisserman A．Video google:A Text Retrieval Approach to Object Matching in Videos[C]∥Proceedings of IEEE International Conference on Computer Vision,2003．Nice,France:IEEE Computer Society,2003:1470-1477
[42] Lazebnik S,Raginsky M．Supervised Learning of Quantizer Codebooks by Information Loss Minimization[J]．IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,1(7):1294-1309
[43] Mairal J,Bach F,Ponce J,et al.Discriminative learned dictionaries for local image analysis[C]∥Proceedings of IEEE Confe-rence on Computer Vision and Pattern Recognition,2008．Anchorage,Alaska,USA:IEEE Computer Society,2008:1-8
[44] Gemert J C V,Geusebroek J M,Veenman C J,et al.Kernel codebooks for scene categorization[C]∥Proceedings of European Conference on Computer Vision,2008．Marseille,France:Springer,2008:696-709
[45] van Gemert J C,Veenman C J,Smeulders A W M,et al.Visual Word Ambiguity[J]．IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,32(7):1271-1283
[46] Lee H,Battle A,Raina R,et al.Efficient Ssparse Coding Algorithms[C]∥Proceedings of Advances in Neural Information Processing System,2007．Vancouver,B.C.,Canada:Springer,2007
[47] Yang J,Yu K,Gong Y,et al.Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2009．Miami,Florida,USA:IEEE Computer Society,2009:1794-1801
[48] Gao S,Tsang I,Chia L,et al.Local Features Are Not Lonely-Laplacian Sparse Coding for Image Classification[C]∥Procee-dings of IEEE Conference on Computer Vision and Pattern Recognition,2010．San Francisco,CA,USA:IEEE Computer Society,2010:3555-3561
[49] Wang J,Yang J,Yu K,et al．Locality-constrained linear coding for image classification[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2010．San Francisco,CA,USA:IEEE Computer Society,2010:3360-3367
[50] Yu K,Zhang T,Gong Y．Nonlinear Learning Using Local Coordinate Coding[C]∥Proceedings of Advances in Neural Information Processing System,2009．Vancouver,British Columbia,Canada:Springer,2009
[51] Marcelja S．Mathematical description of the responses of simple cortical cells[J]．Journal of the Optical Society of America,1980,0(11):1297-1300
[52] Liu Ling-qiao,Wang Lei,Liu Xin-wang.In Defense of Soft-as-signment Coding[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2011．Colorado Springs,CO,USA:IEEE Computer Society,2011:2486-2493
[53] Huang Y,Huang K,Yu Y,et al.Salient Coding for Image Classification[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2011．Colorado Springs,CO,USA:IEEE Computer Society,2011:1753-1760
[54] Shabou A,LeBorgne H．Locality-constrained and Spatially Regularized Coding for Scene Categorization[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2012．Providence,RI,USA:IEEE Computer Society,2012:3618-3625
[55] Hubel D H,Wiesel T N．Receptive Fields,Binocular Interaction and Functional Architecture in the Cat’s Vsual Cortex[J]．The Journal of Physiology,1962,160:106-54
[56] Koenderink J J,Van Doorn A J．The structure of locally orderless images[J]．International Journal of Computer Vision,1999,31(2/3):159-168
[57] Fukushima K,Miyake S．Neocognitron:A New Algorithm forPattern Recognition Tolerant of Deformations and Shifts in Position[J]．Pattern Recognition,1982,5(6):455-469
[58] LeCun Y,Boser B,Denker J S,et al．Handwritten digit recognition with a back-propagation network[C]∥Proceedings of Conference on Neural Information Processing,1989．Morgan Kaufmann,1990:396-404
[59] Ranzato M,Boureau Y,LeCun Y．Sparse feature learning fordeep belief networks[C]∥Proceedings of Conference on Neural Information Processing,2007．Vancouver,B.C.,Canada:Sprin-ger,2007
[60] Jarrett K,Kavukcuoglu K,Ranzato M,et al.What is the BestMulti-stage Architecture for Object Rcognition?[C]∥Procee-dings of IEEE Conference on Computer Vision and Pattern Recognition,2009．Miami,Florida,USA:IEEE Computer Society,2009:2146-2153
[61] Serre T,Wolf L,Poggio T．Object recognition with features inspired by visual cortex[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2005．San Diego,CA,USA:IEEE Computer Society,2005:994-1000
[62] Pinto N,Cox D,DiCarlo J．Why is real-world visual object recognition hard[J]．PLoS Computational Biology,2008,4(1):151-156,
[63] Sivic J,Zisserman A．Video Google:A text retrieval approach to object matching in videos[C]∥Proceedings of IEEE International Conferenceon Computer Vision,2003.IEEE Computer Society,2003:1470-1477
[64] Zhang J,Marszalek M,Lazebnik S,et al.Local features and kernels for classifcation of texture and object categories:An in-depth study[J]．International Journal of Computer Vision,2007,73(2):213-238
[65] Yang J,Yu K,Gong Y,et al.Linear Spatial Pyramid MatchingUsing Sparse Coding for Image Classification[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2009．Miami,Florida,USA:IEEE Computer Society,2009:1794-1801
[66] Lazebnik S,Schmid C,Ponce J．Beyond bags of features:Spatial pyramid matching for recognizing natural scene categories[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2006．New York,NY,USA:IEEE Computer Society,2006:2169-2178
[67] Boureau Y,Bach F,LeCun Y,et al.Learning mid-level features for recognition[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2010．San Francisco,CA,USA:IEEE Computer Society,2010:2559-2566
[68] Boureau Y,Ponce J,LeCun Y．A theoretical analysis of feature pooling in vision algorithms[C]∥Proceedings of International Conference on Machine Learning,2010．Haifa,Israel:Omnipress,2010
[69] Feng Jia-shi,Ni Bing-bing,Tian Qi,et al.Geometric p-normFeature Pooling for Image Classification[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2011．Colorado Springs,CO,USA:IEEE Computer Society,2011:2609-2704
[70] Yang Ji-mei,Yang M-H．Learning Hierarchical Image Representation with Sparsity,Saliency and Locality[C]∥British Machine Vision Conference,2011．British:BMVA Press,2011:19.1-19.11
[71] Avila S,Thome N,Cord M,et al.B ossa:Extended Bow Formalism for Image Classification[C]∥Proceedings of International Conference on Image Processing,2011．Brussels,Belgium:IEEE Computer Society,2011:2909-2912
[72] Yu Xin-nan,Zhang Yu-jin．A 2-D Histogram Representation of Images for Pooling[C]∥SPIE.2011
[73] Harada T,Ushiku Y,Yamashita Y,et al.Discriminative Spatial Pyramid[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2011．Colorado Springs,CO,USA:IEEE Computer Society,2011:1617-1624
[74] Cao Yang,Wang Chang-hu,Li Zhi-wei,et al.Spatial-Bag-of-Features[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2010．San Francisco,CA,USA:IEEE Computer Society,2010:3352-3359
[75] Jia Yang-qing,Huang Chang,Darrell T.Beyond Spatial Pyra-mids:Receptive Field Learning for Pooled Image Features[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2012．Providence,RI,USA:IEEE Computer Society,2012:3370-3377
[76] Deng J,Dong W,Socher R,et al.ImageNet:a large-scale hierarchical image database[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2009．Miami,Florida,USA:IEEE Computer Society,2009:248-255
[77] Schmid C,Mohr R,Bauckhage C．Evaluation of interest pointdetectors[J]．International Journal of Computer Vision,2000,37(2):151-172
[78] Wang Xing-gang,Bai Xiang,Liu Wen-yu,et al．Feature context for image classification and object detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2011．Colorado Springs,CO,USA:IEEE Computer Society,2011:961-968
[79] Malinowski M,Fritz M．Learnable Pooling Regions for ImageClassification．http://arxiv.org/abs/1301.3516
[80] Bruce N D B,Tsotsos J K．Saliency,attention,and visual search:An information theoretic approach[J]．Journal of Vision,2009,9(3):1-24
[81] Elazary L,Itti L．Interesting objects are visually salient[J]．Journal of Vision,2008,8(3):1-15
[82] Kienzle W,Franz M O,Schlkopf B,et al．Center-surround patterns emerge as optimal predictors for human saccade targets[J]．Journal of Vision,2009,9(5):1-15
[83] Tatler B W,Baddeley R J,Gilchrist I D．Visual correlates of fixation selection:Effects of scale and time[J]．Vision Research,2005,45(5):643-659
[84] Maree R,Geurts P,Piater J,et al．Raet alndom subwindows for robust image classification[C]∥Proceedings of IEEE International Conferenceon Computer Vision,2005．San Diego,CA,USA:IEEE Computer Society,2005:34-40
[85] Nowak E,Jurie F,Triggs B．Sampling strategies for bag-of-fea-tures image classification[C]∥Proceedings of European Confe-renceon Computer Vision,2006．Graz,Austria:Springer,2006,3954:490-503
[86] Lazebnik S,Schmid C,Ponce J．A sparse texture representation using local affine regions[J]．IEEE Transactions on Pattern Analysis & Machine Intelligence,2005,7(8):1265-1278
[87] Daugman J G．Two-dimensional spectral analysis of cortical receptive field profile[J].Vision Research ,1980,0(10):847-856
[88] Daugman J G．Uncertainty relation for resolution in space,spatial frequency,and orientation optimized by two-dimensional visual cortical filters[J]．Journal of the Optical Society of America A,1985,2(7):1160-1169
[89] Hui Bin,Tang Xu-sheng,Luo Hai-bo,et al．SDF Matched Filter Based on Gabor Wavelet Transform for Face Recognition[J]．Information and Control,2008,37(5):633-636

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed