计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 158-165.doi: 10.11896/jsjkx.250600063

• 数据库 & 大数据 & 数据科学 • 上一篇    下一篇

基于指示词表征学习的半监督聚类方法

王一鸣1,2, 焦敏2, 赵素云2,3, 陈红1,3, 李翠平1,3   

  1. 1 数据库与商务智能教育部工程研究中心 北京 100872
    2 中国人民大学信息学院 北京 100872
    3 数据工程与知识工程教育部重点实验室 北京 100872
  • 收稿日期:2025-06-11 修回日期:2025-10-10 发布日期:2026-03-12
  • 通讯作者: 赵素云(zhaosuyun@ruc.edu.cn)
  • 作者简介:(18811373716@163.com)
  • 基金资助:
    国家重点研发计划(2023YFB4503600);国家自然科学基金(U23A20299,U24B20144,62172424,62276270,62322214)

Prompt-conditioned Representation Learning with Diffusion Models for Semi-supervised Clustering

WANG Yiming1,2, JIAO Min2, ZHAO Suyun2,3, CHEN Hong1,3, LI Cuiping1,3   

  1. 1 Engineering Research Center of Database and Business Intelligence, MOE, Beijing 100872, China
    2 School of Information, Renmin University of China, Beijing 100872, China
    3 Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, Beijing 100872, China
  • Received:2025-06-11 Revised:2025-10-10 Online:2026-03-12
  • About author:WANG Yiming,born in 2000,postgra-duate.His main research interests include semi-supervised clustering and diffusion model.
    ZHAO Suyun,born in 1979,professor,Ph.D supervisor,is a member of CCF(No.62717M).Her main research interests include image processing and applications,generalization analysis of weakly supervised learning and data security in large-scale models,etc.
  • Supported by:
    National Key Research & Development Program of China(2023YFB4503600) and National Natural Science Foundation of China(U23A20299,U24B20144,62172424,62276270,62322214).

摘要: 当前聚类方法通过联合学习聚类友好的表征空间和聚类分配来提高性能,局限于视觉编码器产生的固定表征空间,并基于特征空间内的欧氏距离或余弦相似度的度量体系进行聚类分配。受扩散模型在稳定训练特性与指示词表征的条件控制能力的启发,在方法上,通过将类簇中心编码为可学习的条件嵌入向量,构建噪声预测误差驱动的生成式度量函数,突破传统欧氏空间线性可分性限制,并设计监督预训练与无监督调整两阶段动态优化策略,利用语义锚定和匹配损失协同平衡类内紧致性与类间可分性;在理论上,基于拉德马赫复杂度与噪声预测有界性假设,推导出聚类期望风险上界为$\mathcal{O}$(k/n),证明方法在大规模数据下的渐进一致性,保证所提出方法的泛化能力,同时,揭示监督信息通过强凸性约束和去噪网络Lipschitz连续性可将误差主项衰减速率提升至$\mathcal{O}$(1\/nmc),阐明标注量对假设空间的压缩效应;在实验上,所提方法在ImageNet-10等基准数据集上表现出竞争性结果,消融实验为该方法提供了实证支撑。

关键词: 半监督学习, 指示词表征学习, 扩散模型, 生成式度量方法, 聚类风险

Abstract: Current clustering methods enhance performance by jointly learning cluster-friendly representation spaces and cluster assignments.However,they remain fundamentally constrained by static embedding spaces primarily derived from pre-trained visual encoders,where cluster assignments rely on rigid metric systems(e.g.,Euclidean distance or cosine similarity) within the fixed feature space.Inspired by the stable training dynamics and conditional control capabilities of diffusion models,this paper proposes a novel semi-supervised clustering framework.Methodologically,it encodes cluster centers as learnable conditional embedding vectors and constructs a noise-prediction-error-driven generative metric function,transcending the traditional Euclidean linear separability constraints.A two-stage dynamic optimization strategy is designed,integrating supervised pre-training with semantic anchoring and unsupervised adaptation with matching losses to balance intra-cluster compactness and inter-class separabi-lity.Theoretically,based on Rademacher complexity and bounded noise-prediction assumptions,it derives an expected risk upper bound of $\mathcal{O}$(k/n) proving the asymptotic consistency of the proposed method on large-scale data and guaranteeing its generalization capability.Furthermore,it demonstrates that supervised information,through strong convexity constraints and Lipschitz continuity of the denoising network,accelerates the decay rate of the dominant error term to $\mathcal{O}$(1\/nmc) elucidating the compression effect of labeled data on hypothesis space complexity.Experimentally,the proposed framework achieves competitive results on benchmark datasets such as ImageNet-10,supported by ablation studies validating the efficacy of key components.

Key words: Semi-supervised learning, Prompt representation learning, Diffusion models, Generative metric methods, Clustering risk

中图分类号: 

  • TP181
[1]MACQUEEN J.Some methods for classification and analysis of multivariate observations[C]//Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability.University of California press,1967:281-298.
[2]DASGUPTA S,LONG P M.Performance guarantees for hierarchical clustering[J].Journal of Computer and System Sciences,2005,70(4):555-569.
[3]LI Y,HU P,LIU Z,et al.Contrastive clustering[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2021:8547-8555.
[4]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the Second International Conference on Knowledge Discovery and Data Mining.AAAI,1996:226-231.
[5]SINGH K K,OJHA U,LEE Y J.Finegan:Unsupervised hierarchical disentanglement for fine-grained object generation and discovery[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:6490-6499.
[6]KIM Y,HA J W.Contrastive fine-grained class clustering via generative adversarial networks[J].arXiv:2112.14971,2021.
[7]BRADLEY P S,BENNETT K P,DEMIRIZ A.Constrained K-Means Clustering[J/OL].https://www.researchgate.net/publication/2458036_Constrained_K-Means_Clustering.
[8]SUN B,ZHOU P,DU L,et al.Active deep image clustering[J].Knowledge-Based Systems,2022,252:109346.
[9]SOHN K,BERTHELOT D,CARLINI N,et al.Fixmatch:Simplifying semi-supervised learning with consistency and confidence[J].Advances in Neural Information Processing Systems,2020,33:596-608.
[10]HO J,JAIN A,ABBEEL P.Denoising diffusion probabilisticmodels[J].Advances in Neural Information Processing Systems,2020,33:6840-6851.
[11]DHARIWAL P,NICHOL A.Diffusion models beat gans on image synthesis[J].Advances in Neural Information Processing Systems,2021,34:8780-8794.
[12]DONG Z,WEI P,LIN L.Dreamartist:Towards controllableone-shot text-to-image generation via positive-negative prompt-tuning[J].arXiv:2211.11337,2022.
[13]RUIZ N,LI Y,JAMPANI V,et al.Dreambooth:Fine tuningtext-to-image diffusion models for subject-driven generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:22500-22510.
[14]LIU P,YUAN W,FU J,et al.Pre-train,prompt,and predict:A systematic survey of prompting methods in natural language processing[J].ACM Computing Surveys,2023,55(9):1-35.
[15]VAN GANSBEKE W,VANDENHENDE S,GEORGOULIS S,et al.Scan:Learning to classify images without labels[C]//European Conference on Computer Vision.Cham:Springer,2020:268-285.
[16]DANG Z,DENG C,YANG X,et al.Nearest neighbor matching for deep clustering[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2021:13693-13702.
[17]NIU C,SHAN H,WANG G.Spice:Semantic pseudo-labelingfor image clustering[J].IEEE Transactions on Image Proces-sing,2022,31:7264-7278.
[18]WANG Y,CHEN H,HENG Q,et al.Freematch:Self-adaptive thresholding for semi-supervised learning[J].arXiv:2205.07246,2022.
[19]SHEN Y,SHEN Z,WANG M,et al.You never cluster alone[J].Advances in Neural Information Processing Systems,2021,34:27734-27746.
[20]ZHONG H,WU J,CHEN C,et al.Graph contrastive clustering[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:9224-9233.
[21]HUANG Z,CHEN J,ZHANG J,et al.Learning representation for clustering via prototype scattering and positive sampling[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(6):7509-7524.
[22]GRILL J B,STRUB F,ALTCHÉ F,et al.Bootstrap your own latent-a new approach to self-supervised learning[J].Advances in Neural Information Processing Systems,2020,33:21271-21284.
[23]OHI A Q,MRIDHA M F,SAFIR F B,et al.Autoembedder:A semi-supervised DNN embedding system for clustering[J].Knowledge-Based Systems,2020,204:106190.
[24]REN Y,HU K,DAI X,et al.Semi-supervised deep embedded clustering[J].Neurocomputing,2019,325:121-130.
[25]ŚMIEJA M,STRUSKI Ł,FIGUEIREDO M A T.A classification-based approach to semi-supervised clustering with pairwise constraints[J].Neural Networks,2020,127:193-203.
[26]BAI L,LIANG J Y,CAO F.Semi-supervised clustering withconstraints of different types from multiple information sources[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,43(9):3247-3258.
[27]MANDUCHI L,CHIN-CHEONG K,MICHEL H,et al.Deep conditional gaussian mixture model for constrained clustering[J].Advances in Neural Information Processing Systems,2021,34:11303-11314.
[28]REN P,XIAO Y,CHANG X,et al.A survey of deep activelearning[J].ACM Computing Surveys,2021,54(9):1-40.
[29]SUN B,ZHOU P,DU L,et al.Active deep image clustering[J].Knowledge-Based Systems,2022,252:109346.
[30]SONG J,MENG C,ERMON S.Denoising diffusion implicitmodels[J].arXiv:2010.02502,2020.
[31]MOKADY R,HERTZ A,ABERMAN K,et al.Null-text inversion for editing real images using guided diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:6038-6047.
[32]GAL R,ALALUF Y,ATZMON Y,et al.An image is worth oneword:Personalizing text-to-image generation using textual inversion[J].arXiv:2208.01618,2022.
[33]DONG Z,WEI P,LIN L.Dreamartist:Towards controllableone-shot text-to-image generation via positive-negative prompt-tuning[J].arXiv:2211.11337,2022.
[34]RUIZ N,LI Y,JAMPANI V,et al.Dreambooth:Fine tuningtext-to-image diffusion models for subject-driven generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:22500-22510.
[35]ZHUANG J,ZENG Y,LIU W,et al.A task is worth one word:Learning with task prompts for high-quality versatile image inpainting[C]//European Conference on Computer Vision.Cham:Springer,2024:195-211.
[36]VINCENT P,LAROCHELLE H,LAJOIE I,et al.Stacked denoising autoencoders:Learning useful representations in a deep network with a local denoising criterion[J].Journal of Machine Learning Research,2010,11:3371-3408.
[37]CHANG J,WANG L,MENG G,et al.Deep adaptive imageclustering[C]//Proceedings of the IEEE International Confe-rence on Computer Vision.2017:5879-5887.
[38]HUANG D,CHEN D H,CHEN X,et al.Deepclue:Enhanced image clustering via multi-layer ensembles in deep neural networks[J].arXiv:2206.00359,2022.
[39]METAXAS I M,TZIMIROPOULOS G,PATRAS I.Divclust:Controlling diversity in deep clustering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:3418-3428.
[40]LIU Y.Refined learning bounds for kernel and approximatek-means[J].Advances in Neural Information Processing Systems,2021,34:6142-6154.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!