基于句子级最大频繁单词集的Web文档聚类研究

计算机科学 ›› 2007, Vol. 34 ›› Issue (7): 154-157.

• 软件工程与数据库技术 • 上一篇下一篇

基于句子级最大频繁单词集的Web文档聚类研究

路松峰陈云开袁莉

华中科技大学计算机科学与技术学院,武汉430074

出版日期:2018-11-16 发布日期:2018-11-16

LU Song-Feng, CHEN Yun-Kai, YUAN Li （School of Computer Science and Technology, Huazhong University of Science~Technology, Wuhan 430074）

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： Web文档聚类是Web挖掘的一个重要研究方向。现有的挖掘算法得到的频繁模式不仅维数高，而且不能很好反映文档表达的语义信息。为了得到更精确的聚类结果，本文提出一种基于句子级的最大频繁单词集挖掘方法来挖掘文档特征项。在此基础上，先初步聚类后依据类间距离和类内链接强度阈值合并或拆分类，最终实现文档聚类。在此过程中，使用可变精度粗糙集模型计算每个类的特征向量。实验结果表明，本文提出的算法优于传统的文档聚类算法。

关键词: Web文档聚类粗糙集关联规则最大频繁单词集

Abstract: Web document clustering is an important research direction in Web mining area. Frequent pattern acquired form existing mining algorithms not only hashigh dimension, but can＇t reflects semantic information expressed form document well. For gaining more pre

Key words: Web document cluster, Rough set, Association rules, Maximum frequent words set

路松峰陈云开袁莉. 基于句子级最大频繁单词集的Web文档聚类研究[J]. 计算机科学, 2007, 34(7): 154-157. https://doi.org/

LU Song-Feng, CHEN Yun-Kai, YUAN Li （School of Computer Science and Technology, Huazhong University of Science~Technology, Wuhan 430074）. [J]. Computer Science, 2007, 34(7): 154-157. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于句子级最大频繁单词集的Web文档聚类研究

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0