Computer Science ›› 2026, Vol. 53 ›› Issue (4): 24-32.doi: 10.11896/jsjkx.251000037
• Interdisciplinary Integration of Artificial Intelligence and Theoretical Computer Science • Previous Articles Next Articles
GAO Guichen, JIANG Shaofeng
CLC Number:
| [1]JÉGOU H,DOUZE M,SCHMID C.Product Quantization forNearest Neighbor Search[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(1):117-128. [2]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global Vectors for Word Representation[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing.ACL,2014:1532-1543. [3]BABENKO A,LEMPITSKY V S.Efficient Indexing of Billion-Scale Datasets of Deep Descriptors[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Computer Society,2016:2055-2063. [4]MAHAJAN M,NIMBHORKAR P,VARADARAJAN K R.The Planar k-Means Problem is NP-hard[J].Theoretical Computer Science,2012,442:13-21. [5]INABA M,KATOH N,IMAI H.Applications of WeightedVoronoi Diagrams and Randomization to Variance-Based k-Clustering[C]//Proceedings of Annual Symposium on Computational Geometry.New York:ACM,1994:332-339. [6]MATOUŠEK J.On Approximate Geometric k-Clustering[J].Discrete & Computational Geometry,2000,24(1):61-84. [7]DE LA VEGA W F,KARPINSKI M,KENYON C,et al.Approximation Schemes for Clustering Problems[C]//Procee-dings of Annual ACM Symposium on Theory of Computing.New York:ACM,2003:50-58. [8]HAR-PELED S,MAZUMDAR S.On Coresets for k-Means andk-Median Clustering[C]//Proceedings of Annual ACM Symposium on Theory of Computing.New York:ACM,2004:291-300. [9]FELDMAN D,MONEMIZADEH M,SOHLER C.A PTAS fork-Means Clustering Based on Weak Coresets[C]//Proceedings of ACM Symposium on Computational Geometry.New York:ACM,2007:11-18. [10]CHEN K.On Coresets for k-Median and k-Means Clustering in Metric and Euclidean Spaces and Their Applications[J].SIAM Journal on Computing,2009,39(3):923-947. [11]KUMAR A,SABHARWAL Y,SEN S.Linear-Time Approxi-mation Schemes for Clustering Problems in Any Dimensions[J].Journal of the ACM,2010,57(2):5:1-5:32. [12]COHEN-ADDAD V.A Fast Approximation Scheme for Low-Dimensional k-Means[C]//Proceedings of Annual ACM-SIAM Symposium on Discrete Algorithms.New Orleans:SIAM,2018:430-440. [13]FRIGGSTAD Z,REZAPOUR M,SALAVATIPOUR M R.Local Search Yields a PTAS for k-Means in Doubling Metrics[J].SIAM Journal on Computing,2019,48(2):452-480. [14]COHEN-ADDAD V,KLEIN P N,MATHIEU C.Local Search Yields Approximation Schemes for k-Means and k-Median in Euclidean and Minor-Free Metrics[J].SIAM Journal on Computing,2019,48(2):644-667. [15]ABBASI F,BANERJEE S,BYRKA J,et al.ParameterizedApproximation Schemes for Clustering with General Norm Objectives[C]//Proceedings of IEEE Annual Symposium on Foundations of Computer Science.IEEE,2023:1377-1399. [16]COHEN-ADDAD V,FELDMANN A E,SAULPIC D.Near-Linear Time Approximation Schemes for Clustering in Doubling Metrics[J].Journal of the ACM,2021,68(6):44:1-44:34. [17]AWASTHI P,CHARIKAR M,KRISHNASWAMY R,et al.The Hardness of Approximation of Euclidean k-Means[C]//Proceedings of International Symposium on Computational Geometry.Eindhoven:Schloss Dagstuhl-Leibniz-Zentrum für Informatik,2015:754-767. [18]LLOYD S P.Least Squares Quantization in PCM[J].IEEETransactions on Information Theory,1982,28(2):129-136. [19]ARTHUR D,VASSILVITSKII S.k-Means++:the Advantages of Careful Seeding[C]//Proceedings of Annual ACM-SIAM Symposium on Discrete Algorithms.New Orleans:SIAM,2007:1027-1035. [20]METTU R R,PLAXTON C G.Optimal Time Bounds for Approximate Clustering[J].Machine Learning,2004,56(1/3):35-60. [21]BACHEM O,LUCIC M,HASSANI S H,et al.Fast and Pro-vably Good Seedings for k-Means[C]//Proceeding of Annual Conference on Neural Information Processing Systems.Barcelona,2016:55-63. [22]LATTANZI S,SOHLER C.A Better k-Means++ Algorithm via Local Search[C]//Proceedings of International Conference on Machine Learning.Long Beach:PMLR,2019:3662-3671. [23]CHOO D,GRUNAU C,PORTMANN J,et al.k-Means++:Few More Steps Yield Constant Approximation[C]//Procee-dings of International Conference on Machine Learning.Virtual Event:PMLR,2020:1909-1917. [24]HUANG J,FENG Q,HUANG Z,et al.FLS:A New LocalSearch Algorithm for k-Means with Smaller Search Space[C]//Proceedings of International Joint Conference on Artificial Intelligence.Vienna:ijcai.org,2022:3092-3098. [25]BERETTA L,COHEN-ADDAD V,LATTANZI S,et al.Multi-Swap k-Means++[C]//Proceeding of Annual Conference on Neural Information Processing Systems.2023. [26]HUANG J,FENG Q,HUANG Z,et al.Linear Time Algorithms for k-Means with Multi-Swap Local Search[C]//Proceeding of Annual Conference on Neural Information Processing Systems.2023. [27]FAN C,LI P,LI X.LSDS++:Dual Sampling for Accelerated k-Means++[C]//Proceedings of International Conference on Machine Learning.PMLR,2023:9640-9649. [28]HUANG J,FENG Q,ZHANG Z,et al.Fast Local Search Algorithms for Clustering with Adaptive Sampling and Bandit Strategies[C]//Proceeding of Annual Conference on Neural Information Processing Systems.2025. [29]BECCHETTI L,BURY M,COHEN-ADDAD V,et al.Oblivious Dimension Reduction for k-Means:Beyond Subspaces and the Johnson-Lindenstrauss Lemma[C]//Proceedings of Annual ACM SIGACT Symposium on Theory of Computing.New York:ACM,2019:1039-1050. [30]BOUTSIDIS C,ZOUZIAS A,DRINEAS P.Random Projections for k-Means Clustering[C]//Proceedings of Annual Conference on Neural Information Processing Systems.Curran Associates Inc.,2010:298-306. [31]COHEN M B,ELDER S,MUSCO C,et al.Dimensionality Reduction for k-Means Clustering and Low Rank Approximation[C]//Proceedings of Annual ACM on Symposium on Theory of Computing.New York:ACM,2015:163-172. [32]MAKARYCHEV K,MAKARYCHEV Y,RAZENSHTEYN I.Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering[C]//Proceedings of Annual ACM SIGACT Symposium on Theory of Computing.New York:ACM,2019:1027-1038. [33]JOHNSON W B,LINDENSTRAUSS J.Extensions of Lipschitz Mappings into Hilbert Space[J].Contemporary mathematics,1984,26(1):189-206. [34]LANGBERG M,SCHULMAN L J.Universal Epsilon-Approximators for Integrals[C]//Proceedings of Annual ACM-SIAM Symposium on Discrete Algorithms.SIAM,2010:598-607. [35]FELDMAN D,LANGBERG M.A Unified Framework for Approximating and Clustering Data[C]//Proceedings of ACM Symposium on Theory of Computing.New York:ACM,2011:569-578. [36]DRAGANOV A,SAULPIC D,SCHWIEGELSHOHN C.Set-tling Time vs.Accuracy Tradeoffs for Clustering Big Data[J].Proceedings of the ACM on Management of Data,2024,2(3):173. [37]HAR-PELED S,KUSHAL A.Smaller Coresets for k-Medianand k-Means Clustering[J].Discrete & Computational Geometry,2007,37(1):3-19. [38]BACHEM O,LUCIC M,KRAUSE A.Scalable k-Means Clustering via Lightweight Coresets[C]//Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.New York:ACM,2018:1119-1127. [39]HUANG L,JIANG S H,LI J,et al.Epsilon-Coresets for Cluste-ring(with Outliers) in Doubling Metrics[C]//Proceedings of IEEE Annual Symposium on Foundations of Computer Science.IEEE Computer Society,2018:814-825. [40]SOHLER C,WOODRUFF D P.Strong Coresets for k-Medianand Subspace Approximation:Goodbye Dimension[C]//Proceedings of IEEE Annual Symposium on Foundations of Computer Science.IEEE Computer Society,2018:802-813. [41]FELDMAN D,SCHMIDT M,SOHLER C.Turning Big Data into Tiny Data:Constant-Size Coresets for k-Means,PCA,and Projective Clustering[J].SIAM Journal on Computing,2020,49(3):601-657. [42]HUANG L,VISHNOI N K.Coresets for Clustering in Euclidean Spaces:Importance Sampling is Nearly Optimal[C]//Proceedings of Annual ACM SIGACT Symposium on Theory of Computing.New York:ACM,2020:1416-1429. [43]BRAVERMAN V,JIANG S H,KRAUTHGAMER R,et al.Coresets for Clustering in Excluded-Minor Graphs and Beyond[C]//Proceedings of ACM-SIAM Symposium on Discrete Algorithms.SIAM,2021:2679-2696. [44]COHEN-ADDAD V,SAULPIC D,SCHWIEGELSHOHN C.A New Coreset Framework for Clustering[C]//Proceedings of Annual ACM SIGACT Symposium on Theory of Computing.New York:ACM,2021:169-182. [45]COHEN-ADDAD V,LARSEN K G,SAULPIC D,et al.To-wards Optimal Lower Bounds for k-Median and k-Means Coresets[C]//Proceedings of Annual ACM SIGACT Symposium on Theory of Computing.New York:ACM,2022:1038-1051. [46]COHEN-ADDAD V,LARSEN K G,SAULPIC D,et al.Im-proved Coresets for Euclidean k-Means[C]//Proceedings of Annual Conference on Neural Information Processing Systems.2022. [47]HUANG L,LI J,WU X.On Optimal Coreset Construction forEuclidean (k,z)-Clustering[C]//Proceedings of Annual ACM Symposium on Theory of Computing.New York:ACM,2024:1594-1604. [48]BANSAL N,COHEN-ADDAD V,PRABHU M,et al.Sensitivity Sampling for k-Means:Worst Case and Stability Optimal Coreset Bounds[C]//Proceedings of IEEE Annual Symposium on Foundations of Computer Science.IEEE,2024:1707-1723. [49]COHEN-ADDAD V,DRAGANOV A,RUSSO M,et al.A Tight VC-Dimension Analysis of Clustering Coresets with Applications[C]//Proceedings of Annual ACM-SIAM Symposium on Discrete Algorithms.SIAM,2025:4783-4808. [50]JAIN K,VAZIRANI V V.Approximation Algorithms for Me-tric Facility Location and k-Median Problems Using the Primal-Dual Schema and Lagrangian Relaxation[J].Journal of the ACM,2001,48(2):274-296. [51]KANUNGO T,MOUNT D M,NETANYAHU N S,et al.ALocal Search Approximation Algorithm for k-Means Clustering[J].Computational Geometry,2004,28(2/3):89-112. [52]GUPTA A,TANGWONGSAN K.Simpler Analyses of LocalSearch Algorithms for Facility Location[J].arXiv:0809.2554,2008. [53]AHMADIAN S,NOROUZI-FARD A,SVENSSON O,et al.Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms[C]//Proceedings of IEEE Annual Symposium on Foundations of Computer Science.IEEE Computer Society,2017:61-72. [54]GRANDONI F,OSTROVSKY R,RABANI Y,et al.A Refined Approximation for Euclidean k-Means[J].Information Proces-sing Letters,2022,176:106251. [55]COHEN-ADDAD V,ESFANDIARI H,MIRROKNI V S,et al.Improved Approximations for Euclidean k-Means and k-Median,via Nested Quasi-Independent Sets[C]//Proceedings of Annual ACM SIGACT Symposium on Theory of Computing.New York:ACM,2022:1621-1628. [56]CHARIKAR M,COHEN-ADDAD V,GAO R,et al.An Im-proved Greedy Approximation for(Metric) k-Means[C]//Proceedings of IEEE Annual Symposium on Foundations of Computer Science.IEEE Computer Society,2025. [57]BADOIU M,CZUMAJ A,INDYK P,et al.Facility Location in Sublinear Time[C]//Proceedings of Automata,Languages and Programming,International Colloquium.Springer,2005:866-877. [58]INDYK P,MOTWANI R.Approximate Nearest Neighbors:Towards Removing the Curse of Dimensionality[C]//Proceedings of Annual ACM Symposium on the Theory of Computing.New York:ACM,1998:604-613. [59]HAR-PELED S,INDYK P,MOTWANI R.Approximate Nea-rest Neighbor:Towards Removing the Curse of Dimensionality[J].Theory of Computing,2012,8(1):321-350. [60]ANDONI A,INDYK P.Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions[C]//Proceedings of Annual IEEE Symposium on Foundations of Computer Science.IEEE Computer Society,2006:459-468. [61]ANDONI A,INDYK P.Near-Optimal Hashing Algorithms forApproximate Nearest Neighbor in High Dimensions[J].Communications of the ACM,2008,51(1):117-122. [62]ANDONI A,RAZENSHTEYN I P.Optimal Data-DependentHashing for Approximate Near Neighbors[C]//Proceedings of Annual ACM on Symposium on Theory of Computing.New York:ACM,2015:793-801. [63]GOEL A,INDYK P,VARADARAJAN K R.Reductions among High Dimensional Proximity Problems[C]//Proceedings of Annual Symposium on Discrete Algorithms.Washington:ACM/SIAM,2001:769-778. [64]JIANG S H,JIN Y,LOU J,et al.Local Search for Clustering in Almost-Linear Time[C]//Proceedings of Annual ACM-SIAM Symposium on Discrete Algorithms.SIAM,2026. [65]HAR-PELED S,INDYK P,SIDIROPOULOS A.EuclideanSpanners in High Dimensions[C]//Proceedings of Annual ACM-SIAM Symposium on Discrete Algorithms.SIAM,2013:804-809. [66]THORUP M.Quick k-Median,k-Center,and Facility Locationfor Sparse Graphs[J].SIAM Journal on Computing,2004,34(2):405-432. [67]ARYA V,GARG N,KHANDEKAR R,et al.Local SearchHeuristics for k-Median and Facility Location Problems[J].SIAM Journal on Computing,2004,33(3):544-562. [68]LA TOUR M D,SAULPIC D.Almost-Linear Time Approximation Algorithm to Euclidean k-Median and k-Means[J].arXiv:2407.11217,2024. [69]METTU R R,PLAXTON C G.The Online Median Problem[J].SIAM Journal on Computing,2003,32(3):816-832. [70]COHEN-ADDAD V,LATTANZI S,NOROUZI-FARD A,et al.Fast and Accurate k-Means++ via Rejection Sampling[C]//Proceedings of Annual Conference on Neural Information Processing Systems.2020. [71]CHARIKAR M,HENZINGER M,HU L,et al.Simple,Scalable and Effective Clustering via One-Dimensional Projections[C]//Proceedings of Annual Conference on Neural Information Processing Systems.2023. [72]BHATTACHARYA S,COSTA M,FAROKHNEJAD E,et al.Fully Dynamic Euclidean k-Means[J].arXiv:2507.11256,2025. [73]LATTANZI S,VASSILVITSKII S.Consistent k-Clustering[C]//Proceedings of International Conference on Machine Learning.PMLR,2017:1975-1984. [74]COHEN-ADDAD V,HJULER N,PAROTSIDIS N,et al.Fully Dynamic Consistent Facility Location[C]//Proceedings of Annual Conference on Neural Information Processing Systems.2019:3250-3260. [75]BHATTACHARYA S,COSTA M,LATTANZI S,et al.Fully Dynamic k-Clustering in O~(k) Update Time[C]//Proceedings of Annual Conference on Neural Information Processing Systems.2023. [76]LA TOUR M D,HENZINGER M,SAULPIC D.Fully Dynamic k-Means Coreset in Near-Optimal Update Time[C]//Procee-dings of Annual European Symposium on Algorithms.London:Schloss Dagstuhl-Leibniz-Zentrum für Informatik,2024:100:1-100:16. [77]BHATTACHARYA S,COSTA M,GARG N,et al.Fully Dynamic k-Clustering with Fast Update Time and Small Recourse[C]//Proceedings of IEEE Annual Symposium on Foundations of Computer Science.IEEE,2024:216-227. [78]BHATTACHARYA S,COSTA M,FAROKHNEJAD E.Fully Dynamic k-Median with Near-Optimal Update Time and Recourse[C]//Proceedings of Annual ACM Symposium on Theory of Computing.New York:ACM,2025:1166-1177. [79]BRAVERMAN V,FRAHLING G,LANG H,et al.Clustering High Dimensional Dynamic Data Streams[C]//Proceedings of International Conference on Machine Learning.PMLR,2017:576-585. [80]SONG Z,YANG L F,ZHONG P.Sensitivity Sampling over Dynamic Geometric Data Streams with Applications to k-Clustering[J].arXiv:1802.00459,2018. [81]ACKERMANN M R,LAMMERSEN C,MÄRTENS M,et al.Streamkm++:A Clustering Algorithms for Data Streams[C]//Proceedings of the Twelfth Workshop on Algorithm Engineering and Experiments.SIAM,2010:173-187. [82]FICHTENBERGER H,GILLÉ M,SCHMIDT M,et al.BICO:BIRCH Meets Coresets for k-Means Clustering[C]//Procee-dings of Annual European Symposium on Algorithms.Springer,2013:481-492. [83]AILON N,JAISWAL R,MONTELEONI C.Streaming k-Means Approximation[C]//Proceedings of Annual Conference on Neural Information Processing Systems.Curran Associates Inc.,2009:10-18. [84]ZHANG Y,TANGWONGSAN K,TIRTHAPURA S.Strea-ming k-Means Clustering with Fast Queries[C]//Proceedings of IEEE International Conference on Data Engineering.IEEE Computer Society,2017:449-460. [85]BRAVERMAN V,FELDMAN D,LANG H,et al.StreamingCoreset Constructions for M-Estimators[C]//Approximation,Randomization,and Combinatorial Optimization.Cambridge:Schloss Dagstuhl-Leibniz-Zentrum für Informatik,2019:62:1-62:15. [86]COHEN-ADDAD V,WOODRUFF D P,ZHOU S.StreamingEuclidean k-Median and k-Means with o(logn) Space[C]//Proceedings of IEEE Annual Symposium on Foundations of Computer Science.IEEE,2023:883-908. [87]BRAVERMAN V,LANG H,LEVIN K D,et al.ClusteringProblems on Sliding Windows[C]//Proceedings of Annual ACM-SIAM Symposium on Discrete Algorithms.SIAM,2016:1374-1390. [88]EPASTO A,LATTANZI S,VASSILVITSKII S,et al.Submo-dular Optimization over Sliding Windows[C]//Proceedings of International Conference on World Wide Web.ACM,2017:421-430. [89]EPASTO A,MAHDIAN M,MIRROKNI V S,et al.Improved Sliding Window Algorithms for Clustering and Coverage via Bucketing-Based Sketches[C]//Proceedings of ACM-SIAM Symposium on Discrete Algorithms.SIAM,2022:3005-3042. [90]WOODRUFF D P,ZHONG P,ZHOU S.Near-Optimal k-Clustering in the Sliding Window Model[C]//Proceedings of Annual Conference on Neural Information Processing Systems.2023. [91]KARLOFF H J,SURI S,VASSILVITSKII S.A Model of Computation for Mapreduce[C]//Proceedings of Annual ACM-SIAM Symposium on Discrete Algorithms.SIAM,2010:938-948. [92]DEAN J,GHEMAWAT S.Mapreduce:Simplified Data Proces-sing on Large Clusters[J].Communications of the ACM,2008,51(1):107-113. [93]WHITE T.Hadoop-The Definitive Guide:Storage and Analysis at Internet Scale(4th ed)[M].O’Reilly,2015. [94]ZAHARIA M,CHOWDHURY M,FRANKLIN M J,et al.Spark:Cluster Computing with Working Sets[C]//2nd USENIX Workshop on Hot Topics in Cloud Computing.Boston:USENIX Association,2010. [95]COHEN-ADDAD V,KUHN F,PARSAEIAN Z.An Efficient Massively Parallel Constant-Factor Approximation Algorithm for the k-Means Problem[C]//Proceedings of Annual ACM-SIAM Symposium on Discrete Algorithms.SIAM,2026. [96]BHASKARA A,WIJEWARDENA M.Distributed Clusteringvia LSH Based Data Partitioning[C]//Proceedings of International Conference on Machine Learning.PMLR,2018:569-578. [97]CZUMAJ A,GAO G,JIANG S H,et al.Fully-Scalable MPC Algorithms for Clustering in High Dimension[C]//International Colloquium on Automata,Languages,and Programming.Tallinn:Schloss Dagstuhl-Leibniz-Zentrum für Informatik,2024:50:1-50:20. |
| [1] | LI Zekai, ZHONG Jiaqing, FENG Shaojun, CHEN Juan, DENG Rongyu, XU Tao, TAN Zhengyuan, ZHOU Kexing, ZHU Pengzhi, MA Zhaoyang. CPU Power Modeling Accuracy Improvement Method Based on Training Set Clustering Selection [J]. Computer Science, 2024, 51(9): 59-70. |
| [2] | YAN Xin, HUANG Zhiqiu, SHI Fan, XU Heng. Study on Following Car Model with Different Driving Styles Based on Proximal PolicyOptimization Algorithm [J]. Computer Science, 2024, 51(9): 223-232. |
| [3] | PENG Bo, LI Yaodong, GONG Xianfu. Improved K-means Photovoltaic Energy Data Cleaning Method Based on Autoencoder [J]. Computer Science, 2024, 51(6A): 230700070-5. |
| [4] | XU Tianjie, WANG Pingxin, YANG Xibei. Three-way k-means Clustering Based on Artificial Bee Colony [J]. Computer Science, 2023, 50(6): 116-121. |
| [5] | JIANG Bo, WAN Yi, XIE Xianzhong. Improved YOLOv5s Lightweight Steel Surface Defect Detection Model [J]. Computer Science, 2023, 50(11A): 230900113-7. |
| [6] | CHEN Yuan-yuan, WANG Zhi-hai. Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition [J]. Computer Science, 2022, 49(7): 25-30. |
| [7] | CAO Yang-chen, ZHU Guo-sheng, SUN Wen-he, WU Shan-chao. Study on Key Technologies of Unknown Network Attack Identification [J]. Computer Science, 2022, 49(6A): 581-587. |
| [8] | YANG Xu-hua, WANG Lei, YE Lei, ZHANG Duan, ZHOU Yan-bo, LONG Hai-xia. Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding [J]. Computer Science, 2022, 49(3): 121-128. |
| [9] | KONG Yu-ting, TAN Fu-xiang, ZHAO Xin, ZHANG Zheng-hang, BAI Lu, QIAN Yu-rong. Review of K-means Algorithm Optimization Based on Differential Privacy [J]. Computer Science, 2022, 49(2): 162-173. |
| [10] | QI Ying, CHAI Yan-mei. High-resolution Remote Sensing Sea Ice Image Segmentation Based on Combination of ImprovedSLIC Algorithm and Clustering Algorithm [J]. Computer Science, 2022, 49(11A): 211200100-6. |
| [11] | JIN Yu-fang, WU Xiang, DONG Hui, YU Li, ZHANG Wen-an. Improved YOLO v4 Algorithm for Safety Helmet Wearing Detection [J]. Computer Science, 2021, 48(11): 268-275. |
| [12] | JIA Hong-jie, WANG Liang-jun, SONG He-ping. HMRF Semi-supervised Approximate Kernel k-means Algorithm [J]. Computer Science, 2019, 46(12): 31-37. |
|
||