计算机科学 ›› 2021, Vol. 48 ›› Issue (2): 148-152.doi: 10.11896/jsjkx.191200104

• 计算机图形学&多媒体 • 上一篇    下一篇

基于特征聚类的轻量级图像搜索系统

王晓飞1, 周超2, 刘利刚1   

  1. 1 中国科学技术大学数学科学学院 合肥230000
    2 腾讯计算机系统有限公司 广东 深圳518057
  • 收稿日期:2019-12-17 修回日期:2020-05-18 出版日期:2021-02-15 发布日期:2021-02-04
  • 通讯作者: 刘利刚(lgliu@ustc.edu.cn)
  • 作者简介:wxf9545@mail.ustc.edu.cn
  • 基金资助:
    国家自然科学基金(61672482)

Lightweight Image Retrieval System Based on Feature Clustering

WANG Xiao-fei1, ZHOU Chao2, LIU Li-gang1   

  1. 1 School of Mathematical Sciences,University of Science and Technology of China,Hefei 230000,China
    2 Tencent Computer Systems Co.,Ltd.,Shenzhen,Guangdong 518057,China
  • Received:2019-12-17 Revised:2020-05-18 Online:2021-02-15 Published:2021-02-04
  • About author:WANG Xiao-fei,born in 1995,postgra-duate.His main research interests include image processing and model processing.
    LIU Li-gang,born in 1975,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include geometry modeling,computational fabrication and shape analysis.
  • Supported by:
    The National Natural Science Foundation of China (61672482).

摘要: 在图像搜索的场景中,由于搜索请求的随机性,为了提高搜索速度,搜索算法运行时需要把整个数据集预先载入到运行内存。由于运行内存价格远高于同容量的硬盘价格,降低运行内存自然可以大大降低图像搜索服务的成本,但如果直接对数据进行压缩,往往会极大地损失搜索精度。在这种情况下,文中提出了一种基于图像内容特征的分块式图像搜索框架。先利用神经网络的方法来预先提取图片特征,在不对特征进行量化压缩的前提下,采用一种启发式的聚类方法对数据进行分块,同时保证每个数据块的数据之间有一定的相似性。对于每个数据块,采用基于图结构的HNSW算法来构建索引子图以加速图片查询。在该框架下,通过控制查询时访问的数据块的个数,可以在保证精度的前提下大大减少算法所需要的运行内存容量。

关键词: 近似最近邻匹配, 聚类, 图像检索, 图像特征提取, 相似搜索

Abstract: In the scene of image search,due to the randomness of search request,in order to increase the search speed,it is often necessary to preload the entire data set into the running memory.Because the price of running memory with the same capacity is much higher than that of hard disk,reducing the running memory can greatly reduce the cost of image search service.However,if the data is compressed directly,the search accuracy will be greatly reduced.In this case,this paper proposes a content-based ima-ge search framework,which divides data set into groups.Firstly,the neural network is used to extract image features.On the premise of not compressing the features,a heuristic clustering method is used to group the data,ensuring that there is a certain similarity between the data of each data group.For each data group,HNSW algorithm based on graph structure is used to construct index to speed up image query.In this framework,by controlling the number of data blocks accessed during query,the running memory capacity required by the algorithm can be greatly reduced,under the premise of ensuring the accuracy.

Key words: Approximate nearest neighbor matching, Clustering, Image feature extraction, Image Retrieval, Similarity search

中图分类号: 

  • TP391
[1] COX I J,MILLER M L,MINKA T P,et al.The Bayesian image retrieval system,PicHunter:theory,implementation,and psychophysical experiments[J].IEEE Transactions on Image Processing,2002,9(1):20-37.
[2] KIRANYAZ S,GABBOUJ M.Hierarchical CellularTree:AnEfficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases[J].IEEE Transactions on Multimedia,2007,9(1):102-119.
[3] BABENKO A,LEMPITSKY V.Aggregating Deep Convolutional Features for Image Retrieval[J].arXiv:1510.07493v1,2015.
[4] BABENKO A,SLESAREV A,CHIGORIN A,et al.Neuralcodes for image retrieval[C]//European Conference on Computer Vision.Cham:Springer,2014:584-599.
[5] JEGOU H,DOUZE M,SCHMID C,et al.Aggregating local descriptors into a compact image representation[C]//Computer Vision and Pattern Recognition (CVPR),2010 IEEE Conference on.IEEE,2010.
[6] PERRONNIN F,LIU Y,JORGE S,et al.Large-scale image retrieval with compressed Fisher vectors[C]//CVPR.IEEE,2010.
[7] HERVÉ J,ANDREW Z.Triangulation Embedding and Democratic Aggregation for Image Search[C]//In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'14).IEEE Computer Society,Washington,DC,USA,3310-3317.
[8] SYED S H,MIROSLAW B.Improving Large-Scale Image Retrieval Through Robust Aggregation of Local Descriptors[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(9):1783-1796.
[9] DO T T,TAN D K L,PHAM T T,et al.Simultaneous Feature Aggregating and Hashing for Large-Scale Image Search[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2017.
[10] BENTLEY J L.Multidimensional binary search trees used for associative searching[J].Com. of ACM,1975,18(9):509-517.
[11] FINKEL R A,BENTLEY J L.Quad trees:a data structure for retrieval on composite keys[J].Acta Inf.1974,4(1):1-9.
[12] LEE D T,WONG C K.Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees[J].Acta Inf.1977,9(1):23-29.
[13] MALKOV Y,PONOMARENKO A,LOGVINOVA,et al.Ap-proximate nearest neighbor algorithm based on navigable small world graphs[J].Information Systems,2014,45:61-68.
[14] MALKOV Y A,YASHUNIN D A.Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,PP(99).
[15] MATTHIJS D,ALEXANDRE S,HERVE J.Link and Code:Fast Indexing with Graphs and Compact Regression Codes[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2018.
[16] MARTIN A,BERNHARDSSON E,FAITHFULL A.ANN-Benchmarks:A Benchmarking Tool for Approximate Nearest Neighbor Algorithms[C]//International Conference on Simila-rity Search and Applications.2017.
[17] MACQUEEN J.Some Methods for Classification and Analysis of Multi Variate Observations[C]//Proc.of Berkeley Sympo-sium on Mathematical Statistics & Probability.1965.
[18] LI Y,CHUNG S M.Parallel bisecting k-means with prediction clusteringalgorithm[J].Journal of Supercomputing,2007,39(1):19-37.
[19] MCCALLUM A,NIGAM K,UNGAR L H.Efficient clustering of high-dimensional data sets with application to reference ma-tching[C]//Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining.DBLP,2000.
[20] ARTHUR D,VASSILVITSKII S.k-means++:the advantages of careful seeding[C]//Eighteenth Acm-Siam Symposium on Discrete Algorithms.Society for Industrial and Applied Mathematics.2007:1027-1035.
[21] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[22] JIA Y,SHELHAMER E,DONAHUE J,et al.Caffe:Convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM International Conference on Multimedia.ACM,2014:675-678.
[1] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于分层抽样优化的面向异构客户端的联邦学习
Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients
计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[2] 柴慧敏, 张勇, 方敏.
基于特征相似度聚类的空中目标分群方法
Aerial Target Grouping Method Based on Feature Similarity Clustering
计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[3] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于DBSCAN聚类的集群联邦学习方法
Clustered Federated Learning Methods Based on DBSCAN Clustering
计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059
[4] 郁舒昊, 周辉, 叶春杨, 王太正.
SDFA:基于多特征融合的船舶轨迹聚类方法研究
SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion
计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[5] 毛森林, 夏镇, 耿新宇, 陈剑辉, 蒋宏霞.
基于密度敏感距离和模糊划分的改进FCM算法
FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition
计算机科学, 2022, 49(6A): 285-290. https://doi.org/10.11896/jsjkx.210700042
[6] 陈景年.
一种适于多分类问题的支持向量机加速方法
Acceleration of SVM for Multi-class Classification
计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149
[7] 刘丽, 李仁发.
医疗CPS协作网络控制策略优化
Control Strategy Optimization of Medical CPS Cooperative Network
计算机科学, 2022, 49(6A): 39-43. https://doi.org/10.11896/jsjkx.210300230
[8] 陈佳舟, 赵熠波, 徐阳辉, 马骥, 金灵枫, 秦绪佳.
三维城市场景中的小物体检测
Small Object Detection in 3D Urban Scenes
计算机科学, 2022, 49(6): 238-244. https://doi.org/10.11896/jsjkx.210400174
[9] 邢云冰, 龙广玉, 胡春雨, 忽丽莎.
基于SVM的类别增量人体活动识别方法
Human Activity Recognition Method Based on Class Increment SVM
计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024
[10] 朱哲清, 耿海军, 钱宇华.
面向化学结构的线段聚类算法
Line-Segment Clustering Algorithm for Chemical Structure
计算机科学, 2022, 49(5): 113-119. https://doi.org/10.11896/jsjkx.210700131
[11] 张宇姣, 黄锐, 张福泉, 隋栋, 张虎.
基于菌群优化的近邻传播聚类算法研究
Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization
计算机科学, 2022, 49(5): 165-169. https://doi.org/10.11896/jsjkx.210800218
[12] 左园林, 龚月姣, 陈伟能.
成本受限条件下的社交网络影响最大化方法
Budget-aware Influence Maximization in Social Networks
计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228
[13] 韩洁, 陈俊芬, 李艳, 湛泽聪.
基于自注意力的自监督深度聚类算法
Self-supervised Deep Clustering Algorithm Based on Self-attention
计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001
[14] 杨旭华, 王磊, 叶蕾, 张端, 周艳波, 龙海霞.
基于节点相似性和网络嵌入的复杂网络社区发现算法
Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding
计算机科学, 2022, 49(3): 121-128. https://doi.org/10.11896/jsjkx.210200009
[15] 蒲实, 赵卫东.
一种面向动态科研网络的社区检测算法
Community Detection Algorithm for Dynamic Academic Network
计算机科学, 2022, 49(1): 89-94. https://doi.org/10.11896/jsjkx.210100023
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!