计算机科学 ›› 2017, Vol. 44 ›› Issue (Z11): 411-413.doi: 10.11896/j.issn.1002-137X.2017.11A.087
朱卫星,徐伟光,何红悦,李雯
ZHU Wei-xing, XU Wei-guang, HE Hong-yue and LI Wen
摘要: 文本数据是存储和交换信息最自然的方式,文本挖掘技术可以发现海量文本数据中隐藏的潜在知识模式。研究了文本数据主题挖掘与关联搜索技术,首先通过文本解析提取、分词预处理和索引等进行文本信息处理,然后利用基于潜在语义关系的主题发现模型挖掘大量文本数据中隐藏的主题信息,最后利用主题模型计算关键词间的关联程度进行查询扩展,从而实现关联搜索。实现了一个文本数据挖掘与关联搜索的原型系统,对Tancorp数据集进行主题发现和关联搜索,并以视化和网页同步显示关联搜索的过程。
[1] 曹波伟,薛青.面向军事基础数据的数据挖掘研究[C]∥2009年系统仿真技术及其应用学术会议(CCSSTA’2009)论文集.2009. [2] CORMEN T H,LEISERSON C E,RIVEST R L,et al.Introduction to Algorithms(Second Edition)[M].The MIT Press,2001. [3] FELDMAN R,DAGAN I.KDT-Knowledge Discovery in Tex-tual Database [C]∥Proceedings of the 1st Annual Conference on Knowledge Discovery and DataMining.1995:112-117. [4] MOTHE J,CHRISMENT C,DKAKI T.Information mining-use of the document dimensions to analyze interactively a document set[C]∥European Colloquium on Information Retrieval Research.2001:6-20. [5] GHANEM M,CHORTARAS A,GUO Y,et al.A grid of infrastructure for mixed bioinformatics data and text mining[J].Computer Systems and Applications,2005,4(1):116-130. [6] KARANIKAS H,TJORTJIS C,THEODOULIDIS B.An ap-proach to Text Mining using Information Extraction[C]∥Proceeding of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Database.Lyon,France,2000:13-16. [7] HU Q,YU D,DUAN Y,et al.A novel weighting formula and feature selection for text classification based on rough set theory [C]∥Proceedings of Natural Language Processing and Know-ledge Engineering.2003:638-645. [8] KOSALA R,BLOCKEEL H.Web Mining Research:A Survey [C]∥ACM SIGKDD.2000:1-15. [9] LI H,YAMANISHI K.Mining from Open Answers in Questionaire Data [C]∥Proc.of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2001:443-449. [10] PONS-PORRATA A,BERLANGA-LAVORI R,RUI-SHU-LCLOPER J.Topic discovery based on text mining techniques[J].Information Processing and Management,2007,43(3):752-768. |
No related articles found! |
|