计算机科学 ›› 2017, Vol. 44 ›› Issue (Z11): 411-413.doi: 10.11896/j.issn.1002-137X.2017.11A.087

• 大数据与数据挖掘 • 上一篇    下一篇

文本数据主题挖掘与关联搜索研究

朱卫星,徐伟光,何红悦,李雯   

  1. 解放军理工大学信息管理中心 南京210007,解放军理工大学指挥信息系统学院 南京210007,解放军理工大学指挥信息系统学院 南京210007,解放军理工大学信息管理中心 南京210007
  • 出版日期:2018-12-01 发布日期:2018-12-01

Research on Text Data Topic Mining and Association Search

ZHU Wei-xing, XU Wei-guang, HE Hong-yue and LI Wen   

  • Online:2018-12-01 Published:2018-12-01

摘要: 文本数据是存储和交换信息最自然的方式,文本挖掘技术可以发现海量文本数据中隐藏的潜在知识模式。研究了文本数据主题挖掘与关联搜索技术,首先通过文本解析提取、分词预处理和索引等进行文本信息处理,然后利用基于潜在语义关系的主题发现模型挖掘大量文本数据中隐藏的主题信息,最后利用主题模型计算关键词间的关联程度进行查询扩展,从而实现关联搜索。实现了一个文本数据挖掘与关联搜索的原型系统,对Tancorp数据集进行主题发现和关联搜索,并以视化和网页同步显示关联搜索的过程。

关键词: 文本挖掘,主题发现,关联搜索

Abstract: Text data is the most natural way of storing and exchanging information.Text mining technology can disco-ver knowledge patterns hidden in massive text data.The text data mining and related search technology were studied in the paper,Firstly,text information is extracted by text parsing and extraction,word preprocessing and indexing.Then the theme information model based on latent semantic relations is used to mine the hidden topic information in large amount of text data.Finally,the topic model is used to calculate the relevance degree of keywords.In order to achieve the associated search,a prototype system of text data mining and association search is implemented.Subject discovery and association search were performed on Tancorp dataset,and the process of association search was displayed synchronously with visualization and Web page.

Key words: Text mining,Topic discovery,Association search

[1] 曹波伟,薛青.面向军事基础数据的数据挖掘研究[C]∥2009年系统仿真技术及其应用学术会议(CCSSTA’2009)论文集.2009.
[2] CORMEN T H,LEISERSON C E,RIVEST R L,et al.Introduction to Algorithms(Second Edition)[M].The MIT Press,2001.
[3] FELDMAN R,DAGAN I.KDT-Knowledge Discovery in Tex-tual Database [C]∥Proceedings of the 1st Annual Conference on Knowledge Discovery and DataMining.1995:112-117.
[4] MOTHE J,CHRISMENT C,DKAKI T.Information mining-use of the document dimensions to analyze interactively a document set[C]∥European Colloquium on Information Retrieval Research.2001:6-20.
[5] GHANEM M,CHORTARAS A,GUO Y,et al.A grid of infrastructure for mixed bioinformatics data and text mining[J].Computer Systems and Applications,2005,4(1):116-130.
[6] KARANIKAS H,TJORTJIS C,THEODOULIDIS B.An ap-proach to Text Mining using Information Extraction[C]∥Proceeding of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Database.Lyon,France,2000:13-16.
[7] HU Q,YU D,DUAN Y,et al.A novel weighting formula and feature selection for text classification based on rough set theory [C]∥Proceedings of Natural Language Processing and Know-ledge Engineering.2003:638-645.
[8] KOSALA R,BLOCKEEL H.Web Mining Research:A Survey [C]∥ACM SIGKDD.2000:1-15.
[9] LI H,YAMANISHI K.Mining from Open Answers in Questionaire Data [C]∥Proc.of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2001:443-449.
[10] PONS-PORRATA A,BERLANGA-LAVORI R,RUI-SHU-LCLOPER J.Topic discovery based on text mining techniques[J].Information Processing and Management,2007,43(3):752-768.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!