计算机科学 ›› 2015, Vol. 42 ›› Issue (5): 119-123.doi: 10.11896/j.issn.1002-137X.2015.05.024

• 2014' 数据挖掘会议 • 上一篇    下一篇

基于半监督图聚类的项目主题模型构建方法

石林宾,余正涛,严 馨,宋海霞,洪旭东   

  1. 昆明理工大学信息工程与自动化学院 昆明650500,昆明理工大学信息工程与自动化学院 昆明650500,昆明理工大学信息工程与自动化学院 昆明650500,昆明理工大学信息工程与自动化学院 昆明650500,昆明理工大学信息工程与自动化学院 昆明650500
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(61175068),国家中小企业创新基金(11C26215305905),云南省教育厅基金重大专项项目资助

Project Topic Model Construction Based on Semi-supervised Graph Clustering

SHI Lin-bin, YU Zheng-tao, YAN Xin, SONG Hai-xia and HONG Xu-dong   

  • Online:2018-11-14 Published:2018-11-14

摘要: 项目文档主题表征的好坏直接影响后续评审专家的推荐效果。为有效利用项目文档片段之间的关联关系进行项目主题分析,提出一种基于半监督图聚类的项目主题模型构建方法。该方法首先分析项目文档的结构特点,提取项目名称、项目关键字等能表征主题的结构信息,结合专家证据文档、专家主题关系网等能表征专家主题的外部资源,定义及提取项目文档片段之间的关联关系特征;然后,利用不同类型的关联关系计算项目文档片段之间的相关性,构建项目文档片段间的无向图模型;最后,利用已标记关联关系特征作为聚类的监督信息,采用半监督图聚类算法对项目文档片段进行聚类,从而实现项目主题的提取。项目主题提取对比实验结果验证了所提方法的有效性,项目文档结构化特征、专家证据文档以及专家主题关系网对项目主题模型的构建具有一定的指导作用。

关键词: 主题模型,半监督图聚类,关联关系特征,评审专家推荐

Abstract: The quality of project topic model has a direct impact on recommended effect of the follow-up evaluation experts.In order to effectively exploit the association relationships among project document fragments to analyze project topics,we proposed a project topic model construction method based on semi-supervised graph clustering.We first analyzed structural characteristics of project documents to extract project name,project keywords and other structural information that responds project topics.Combined with expert evidence documents,expert topic relationship networks and other external resources which can indicate expert topics,we defined and extracted the association relationship features among project document fragments.Then,we used different association relationships to calculate correlation among project document fragments and built undirected graph model for project document fragments.Finally,using the marked association relationship features as supervised information for clustering,we applied semi-supervised graph clustering algorithm to cluster for project document fragments to realize the construction of the project topic model.The comparative experimental results of project topic extraction verify the effectiveness of the proposed method.Structural features of the project documents,expert evidence documents and expert topic relationship networks have certain guidance function for the construction of the project topic model.

Key words: Topic model,Semi-supervised graph clustering,Association relationship features,Evaluation experts recommendation

[1] 许云红.基于网络方法的专家知识推荐[D].安徽:中国科学技术大学,2010
[2] 徐戈,王厚峰.自然语言处理中主题模型的发展[J].计算机学报,2011,34(8):1423-1436
[3] Blei D M,Lafferty J D.Dynamic topic models[C]∥Proceedings of the 23rd International Conference on Machine Learning.New York,USA:ACM,2006:113-120
[4] Chong Wang,Bo T,Christopher M,et al.Markov Topic Models[C]∥Proceedings of the 12th International Conference on Artificial Intelligence and Statistics.Clearwater Beach,USA,2009:583-590
[5] 孙艳,周学广,付伟.基于主题情感混合模型的无监督文本情感分析[J].北京大学学报:自然科学版,2013,49(1):102-108
[6] Blei D,McAuliffe J.Supervised topic models[C]∥Advances in Neural Information Processing Systems(NIPS).Vancouver,Canada,2008
[7] Li Wen-bo,Sun Le,Zhang Da-kun.Text classification based on labeled-LDA model[J].Chinese Journal of Computers,2008,31(4):620-627
[8] 江雨燕,李平,王清.基于共享背景主题的 LabeledLDA模型[J].电子学报,2013,41(9):1794-1799
[9] Ville H T,Henry T.Combining Topic Models and Social Networks for Chat Data Mining[C]∥IEEE/WIC/ACM International Conference on Web Intelligence.Los Alamitos,USA:IEEE Computer Society Press,2004:206-213
[10] Tan Xu,Douglas W O.Wikipedia-based Topic Clustering forMicroblogs[J].American Society for Information Science and Technology,2011,48(1):1-10
[11] Wagstaff K,Cardie C.Clustering with instance-level constraints[C]∥Proceedings of the 17rd international conference on Machine learning.Morgan Kaufmann,2000:1103-1110
[12] Brian K,Sugato B,Inderjit S D,et al.Semi-supervisedgraph clustering:a kernel approach[J].Machine Learning,2009,74(1):1-22
[13] Kass R,Wasserman L.A reference Bayesian test for nested hypotheses and its relationship to the Schwarzcriterion[J].Journa1 of the American Statistica1 Association,1995(10):928-934
[14] 郑苗苗,吉根林.一种基于密度的分布式聚类算法[J].南京大学学报,2008,44(5):536-543
[15] 刘群,李素建.基于《知网》的词汇语义相似度计算[C]∥第三届汉语词汇语义学研讨会.台北,2002

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!