Computer Science ›› 2025, Vol. 52 ›› Issue (11): 71-81.doi: 10.11896/jsjkx.240900160

• Database & Big Data & Data Science • Previous Articles     Next Articles

Spatial Pyramid Bag of Words Algorithm Based on Persistent Homology

YI Lisha, PENG Ningning   

  1. School of Mathematics and Statistics,Wuhan University of Technology,Wuhan 430070,China
  • Received:2024-09-26 Revised:2024-12-23 Online:2025-11-15 Published:2025-11-06
  • About author:YI Lisha,born in 2000,postgraduate.Her main research interest is topology data analysis.
    PENG Ningning,born in 1985,Ph.D,associate professor.His main research interests include mathematical logic,computability theory and topology data analysis.
  • Supported by:
    National Natural Science Foundation of China(11701438).

Abstract: To address the mismatch between the output form of topological features extracted from persistent homology and the common input form of machine learning algorithms,this paper proposes a new algorithmic framework-Spatial Pyramid Bag of Words Algorithm Based on Persistent Homology(PHSBoW).This algorithm transforms the persistent diagrams(PDs) generated by persistent homology into fixed-length vectors while maximizing the retention of the topological features contained within the PD diagrams.To improve accuracy and reduce runtime,this paper further develops three algorithms-PHSsBoW,PHSwBoW,and PHSVLAD—based on the PHSBoW algorithm through enhancements like weight optimization,substitution with clustering mo-dels,and expansion of the bag of words model.By conducting experiments on nine datasets of varying types and scales,it combines these four algorithms with support vector machines for classification.The experimental results indicate that,compared to traditional kernel function algorithms(SWK,PSSK,PWGK) and vectorization algorithms(PBoW,PI,PL),classification accuracy is improved on average by 3.29 percentage points to 17.98 percentage points,and runtime is significantly reduced relative to kernel function algorithms.This demonstrates that these algorithms effectively address the challenges of integrating persistent homology into machine learning while significantly enhancing classification accuracy and algorithm execution speed.

Key words: Persistence homology, Bag of words, Spatial pyramid matching, Machine learning, Persistence diagrams

CLC Number: 

  • O189.22
[1]MEHRISH A,MAJUMDER N,BHARADWAJ R,et al.A review of deep learning techniques for speech processing[J].Information Fusion,2023,99:101869.
[2]ORKEN M,DINA O,KEYLAN A,et al.A study of transfor-mer-based end-to-end speech recognition system for Kazakh language[J].Scientific Reports,2022,12(1):8337.
[3]HAN K,WANG Y,CHEN H,et al.A survey on vision trans-former[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(1):87-110.
[4]ZHANG H,SONG H,LI S,et al.A survey of controllable text generation using transformer-based pre-trained language models[J].ACM Computing Surveys,2023,56(3):1-37.
[5]ZOMORODIAN A,CARLSSON G.Computing persistent ho-mology[C]//Proceedings of the Twentieth Annual Symposium on Computational Geometry.2004:347-356.
[6]CARLSSON G.Topology and data[J].Bulletin of the American Mathematical Society,2009,46(2):255-308.
[7]LI Z Q,LI R,SUN K.Power Cable Partial Discharge PatternRecognition Based on Topological Data Analysis for Time Series[J].Journal of University of Electronic Science and Technology of China,2024,53(3):440-446.
[8]YAN Y K,PENG N N,YI L S.Skewed Time Series Classification Algorithm Based on Persistent Homology[J].Computer Engineering,2024,50(6):110-123.
[9]ANTOSH R,DAS S,THYAGU N N.Characterization of dy-namical systems with scanty data using Persistent Homology and Machine Learning[J].arXiv:2408.15834,2024.
[10]SEKULOSKI P,RISTOVSKA V D.Image Classification Using Deep Neural Networks and Persistent Homology[C]//International Conference on ICT Innovations.Cham:Springer,2023:156-170.
[11]PUN C S,LEE S X,XIA K.Persistent-homology-based machine learning:a survey and a comparative study[J].Artificial Intelligence Review,2022,55(7):5169-5213.
[12]REININGHAUS J,HUBER S,BAUER U,et al.A stable multi-scale kernel for topological machine learning[C]//Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4741-4748.
[13]KUSANO G,HIRAOKA Y,UKUMIZU K.Persistence weighted Gaussian kernel for topological data analysis[C]//International Conference on Machine Learning.PMLR,2016:2004-2013.
[14]CARRIERE M,CUTURI M,OUDOT S.Sliced Wassersteinkernel for persistence diagrams[C]//International Conference on Machine Learning.PMLR,2017:664-673.
[15]ADAMS H,EMERSON T,KIRBY M,et al.Persistence images:A stable vector representation of persistent homology[J].Journal of Machine Learning Research,2017,18(8):1-35.
[16]BUBENIK P.Statistical topological data analysis using persistence landscapes[J].Journal of Machine Learning Research,2015,16(1):77-102.
[17]ZIELIŃSKI B,LIPIŃSKI M,JUDA M,et al.Persistence codebooks for topological data analysis[J].Artificial Intelligence Review,2021,54:1969-2009.
[18]DONG Z,LIN H,ZHOU C,et al.Persistence B-spline grids:stable vector representation of persistence diagrams based on data fitting[J].Machine Learning,2024,113(3):1373-1420.
[19]GRAUMAN K,DARRELL T.The pyramid match kernel:Discriminative classification with sets of image features[C]//Tenth IEEE International Conference on Computer Vision(ICCV'05).IEEE,2005:1458-1465.
[20]EDELSBRUNNER H,HARER J L.Computational topology:anintroduction[M].American Mathematical Society,2022.
[1] WANG Yongquan, SU Mengqi, SHI Qinglei, MA Yining, SUN Yangfan, WANG Changmiao, WANG Guoyou, XI Xiaoming, YIN Yilong, WAN Xiang. Research Progress of Machine Learning in Diagnosis and Treatment of Esophageal Cancer [J]. Computer Science, 2025, 52(9): 4-15.
[2] LIU Leyuan, CHEN Gege, WU Wei, WANG Yong, ZHOU Fan. Survey of Data Classification and Grading Studies [J]. Computer Science, 2025, 52(9): 195-211.
[3] JIANG Rui, FAN Shuwen, WANG Xiaoming, XU Youyun. Clustering Algorithm Based on Improved SOM Model [J]. Computer Science, 2025, 52(8): 162-170.
[4] YANG Jixiang, JIANG Huiping, WANG Sen, MA Xuan. Research Progress and Challenges in Forest Fire Risk Prediction [J]. Computer Science, 2025, 52(6A): 240400177-8.
[5] WU Xingli, ZHANG Haoyue, LIAO Huchang. Review of Doctor Recommendation Methods and Applications for Consultation Platforms [J]. Computer Science, 2025, 52(5): 109-121.
[6] JIAO Jian, CHEN Ruixiang, HE Qiang, QU Kaiyang, ZHANG Ziyi. Study on Smart Contract Vulnerability Repair Based on T5 Model [J]. Computer Science, 2025, 52(4): 362-368.
[7] HAN Lin, WANG Yifan, LI Jianan, GAO Wei. Automatic Scheduling Search Optimization Method Based on TVM [J]. Computer Science, 2025, 52(3): 268-276.
[8] XIONG Qibing, MIAO Qiguang, YANG Tian, YUAN Benzheng, FEI Yangyang. Malicious Code Detection Method Based on Hybrid Quantum Convolutional Neural Network [J]. Computer Science, 2025, 52(3): 385-390.
[9] ZUO Xuhong, WANG Yongquan, QIU Geping. Study on Integrated Model of Securities Illegal Margin Trading Accounts Identification Based on Trading Behavior Characteristics [J]. Computer Science, 2025, 52(2): 125-133.
[10] SHANG Qiuyan, LI Yicong, WEN Ruilin, MA Yinping, OUYANG Rongbin, FAN Chun. Two-stage Multi-factor Algorithm for Job Runtime Prediction Based on Usage Characteristics [J]. Computer Science, 2025, 52(2): 261-267.
[11] WANG Wenpeng, GE Hongwei, LI Ting. Adversarial Generative Multi-sensitive Attribute Data Biasing Method [J]. Computer Science, 2025, 52(11): 90-97.
[12] WANG Baocai, WU Guowei. Interpretable Credit Risk Assessment Model:Rule Extraction Approach Based on AttentionMechanism [J]. Computer Science, 2025, 52(10): 50-59.
[13] LI Haixia, SONG Danlei, KONG Jianing, SONG Yafei, CHANG Haiyan. Evaluation of Hyperparameter Optimization Techniques for Traditional Machine Learning Models [J]. Computer Science, 2024, 51(8): 242-255.
[14] ZHANG Daili, WANG Tinghua, ZHU Xinglin. Overview of Sample Reduction Algorithms for Support Vector Machine [J]. Computer Science, 2024, 51(7): 59-70.
[15] ZHOU Tianyang, YANG Lei. Study on Client Selection Strategy and Dataset Partition in Federated Learning Basedon Edge TB [J]. Computer Science, 2024, 51(6A): 230800046-6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!