计算机科学 ›› 2025, Vol. 52 ›› Issue (11): 71-81.doi: 10.11896/jsjkx.240900160

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于持续同调的空间金字塔词袋算法

易丽莎, 彭宁宁   

  1. 武汉理工大学数学与统计学院 武汉 430070
  • 收稿日期:2024-09-26 修回日期:2024-12-23 出版日期:2025-11-15 发布日期:2025-11-06
  • 通讯作者: 彭宁宁(pengn@whut.edu.cn)
  • 作者简介:(lishayi@whut.edu.cn)
  • 基金资助:
    国家自然科学基金(11701438)

Spatial Pyramid Bag of Words Algorithm Based on Persistent Homology

YI Lisha, PENG Ningning   

  1. School of Mathematics and Statistics,Wuhan University of Technology,Wuhan 430070,China
  • Received:2024-09-26 Revised:2024-12-23 Online:2025-11-15 Published:2025-11-06
  • About author:YI Lisha,born in 2000,postgraduate.Her main research interest is topology data analysis.
    PENG Ningning,born in 1985,Ph.D,associate professor.His main research interests include mathematical logic,computability theory and topology data analysis.
  • Supported by:
    National Natural Science Foundation of China(11701438).

摘要: 为了解决持续同调从数据中提取的拓扑特征输出形式与机器学习算法的常用输入形式不匹配这一难题,提出了一个新的算法框架——基于持续同调的空间金字塔词袋模型(PHSBoW算法)。该算法将持续同调输出的持续性图(PD图)转换为固定长度的向量,同时最大限度地保留PD图中所包含的拓扑特征。为提高算法准确率、降低运行时间,在PHSBoW算法的基础上,通过权重优化、聚类模型替代以及词袋模型扩展等改进,进一步发展了PHSsBoW,PHSwBoW,PHSVLAD 3种算法。通过在不同类型和规模的9个数据集上进行实验,将以上4种算法与支持向量机相结合,对数据进行分类。实验结果表明,与传统核函数算法(SWK,PSSK,PWGK)及向量化算法(PBoW,PI,PL)相比,该方法的分类准确率平均提高了3.29个百分点~17.98个百分点,运行时间相较于核函数算法显著降低。这表明,所提出的算法有效解决了持续同调在机器学习中难以结合的问题,同时显著提高了分类准确率和算法运行速度。

关键词: 持续同调, 词袋模型, 空间金字塔匹配, 机器学习, PD图

Abstract: To address the mismatch between the output form of topological features extracted from persistent homology and the common input form of machine learning algorithms,this paper proposes a new algorithmic framework-Spatial Pyramid Bag of Words Algorithm Based on Persistent Homology(PHSBoW).This algorithm transforms the persistent diagrams(PDs) generated by persistent homology into fixed-length vectors while maximizing the retention of the topological features contained within the PD diagrams.To improve accuracy and reduce runtime,this paper further develops three algorithms-PHSsBoW,PHSwBoW,and PHSVLAD—based on the PHSBoW algorithm through enhancements like weight optimization,substitution with clustering mo-dels,and expansion of the bag of words model.By conducting experiments on nine datasets of varying types and scales,it combines these four algorithms with support vector machines for classification.The experimental results indicate that,compared to traditional kernel function algorithms(SWK,PSSK,PWGK) and vectorization algorithms(PBoW,PI,PL),classification accuracy is improved on average by 3.29 percentage points to 17.98 percentage points,and runtime is significantly reduced relative to kernel function algorithms.This demonstrates that these algorithms effectively address the challenges of integrating persistent homology into machine learning while significantly enhancing classification accuracy and algorithm execution speed.

Key words: Persistence homology, Bag of words, Spatial pyramid matching, Machine learning, Persistence diagrams

中图分类号: 

  • O189.22
[1]MEHRISH A,MAJUMDER N,BHARADWAJ R,et al.A review of deep learning techniques for speech processing[J].Information Fusion,2023,99:101869.
[2]ORKEN M,DINA O,KEYLAN A,et al.A study of transfor-mer-based end-to-end speech recognition system for Kazakh language[J].Scientific Reports,2022,12(1):8337.
[3]HAN K,WANG Y,CHEN H,et al.A survey on vision trans-former[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(1):87-110.
[4]ZHANG H,SONG H,LI S,et al.A survey of controllable text generation using transformer-based pre-trained language models[J].ACM Computing Surveys,2023,56(3):1-37.
[5]ZOMORODIAN A,CARLSSON G.Computing persistent ho-mology[C]//Proceedings of the Twentieth Annual Symposium on Computational Geometry.2004:347-356.
[6]CARLSSON G.Topology and data[J].Bulletin of the American Mathematical Society,2009,46(2):255-308.
[7]LI Z Q,LI R,SUN K.Power Cable Partial Discharge PatternRecognition Based on Topological Data Analysis for Time Series[J].Journal of University of Electronic Science and Technology of China,2024,53(3):440-446.
[8]YAN Y K,PENG N N,YI L S.Skewed Time Series Classification Algorithm Based on Persistent Homology[J].Computer Engineering,2024,50(6):110-123.
[9]ANTOSH R,DAS S,THYAGU N N.Characterization of dy-namical systems with scanty data using Persistent Homology and Machine Learning[J].arXiv:2408.15834,2024.
[10]SEKULOSKI P,RISTOVSKA V D.Image Classification Using Deep Neural Networks and Persistent Homology[C]//International Conference on ICT Innovations.Cham:Springer,2023:156-170.
[11]PUN C S,LEE S X,XIA K.Persistent-homology-based machine learning:a survey and a comparative study[J].Artificial Intelligence Review,2022,55(7):5169-5213.
[12]REININGHAUS J,HUBER S,BAUER U,et al.A stable multi-scale kernel for topological machine learning[C]//Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4741-4748.
[13]KUSANO G,HIRAOKA Y,UKUMIZU K.Persistence weighted Gaussian kernel for topological data analysis[C]//International Conference on Machine Learning.PMLR,2016:2004-2013.
[14]CARRIERE M,CUTURI M,OUDOT S.Sliced Wassersteinkernel for persistence diagrams[C]//International Conference on Machine Learning.PMLR,2017:664-673.
[15]ADAMS H,EMERSON T,KIRBY M,et al.Persistence images:A stable vector representation of persistent homology[J].Journal of Machine Learning Research,2017,18(8):1-35.
[16]BUBENIK P.Statistical topological data analysis using persistence landscapes[J].Journal of Machine Learning Research,2015,16(1):77-102.
[17]ZIELIŃSKI B,LIPIŃSKI M,JUDA M,et al.Persistence codebooks for topological data analysis[J].Artificial Intelligence Review,2021,54:1969-2009.
[18]DONG Z,LIN H,ZHOU C,et al.Persistence B-spline grids:stable vector representation of persistence diagrams based on data fitting[J].Machine Learning,2024,113(3):1373-1420.
[19]GRAUMAN K,DARRELL T.The pyramid match kernel:Discriminative classification with sets of image features[C]//Tenth IEEE International Conference on Computer Vision(ICCV'05).IEEE,2005:1458-1465.
[20]EDELSBRUNNER H,HARER J L.Computational topology:anintroduction[M].American Mathematical Society,2022.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!