Computer Science ›› 2015, Vol. 42 ›› Issue (5): 62-66.doi: 10.11896/j.issn.1002-137X.2015.05.013

Previous Articles     Next Articles

Topic Concept Discovery for Web Pages

LIU Qiong-qiong, ZUO Wan-li and WANG Ying   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Topic discovery from Web page has an important impact on natural language processing,such as text classification,automatic abstract generation,information fusion etc.Mining Web page topics can help users better understand the content of Web pages.Although there are some papers discussing topic discovery from ordinary texts,few of them consider how the label a word belongs to and the location in which a word appears affect the weight of a word,and none of them gives calculation methods for the two impact factors.This article extended Web topics from words level to concepts level based on WordNet,used speech tagging to determine the POS of the words,used word sense disambiguation to determine the words’ meaning in the pages,made full use of label impact factor and location impact factor to modify the weights of concepts,and proposed calculation formulas for calculating these two impact factors.Experimental results on DMOZ dataset show that,compared with un-adjusted weight method,the adjusted weights method can significantly improve topic mining accuracy,which can reach up to 0.95 in the best case.

Key words: Speech tagging,Word sense disambiguation,Label impact factor,Location impact factor,Adjusted weights

[1] Jayabharathy J,Kanmani S,Parveen A A.Document Clustering and Topic Discovery based on Semantic Similarity in Scientific Literature[C]∥2011 IEEE 3rd International Conference on Communication Software and Networks (ICCSN).2011:425-429
[2] Uluhan E,Badur B.Development of a Framework for Sub-Topic Discovery from the Web[C]∥PICMET 2008 Proceedings.July 2008:878-888
[3] Shi Jing,Li Wan-long.Topic Discovery Based on LDA Modelwith Fast Gibbs Samping[C]∥2009 International Conference on Artificial Intelligence and Computational Intelligence.2009:91-95
[4] Ding W,Rohban M H,Ishwar P,et al.Topic Discovery through Data Dependent and Random Projections[C]∥International Conference on Machine Learning (ICML’13).2013:471-479
[5] Yang Yun,Wu Ya-nan.Content-based topic discovery of high-impact model[C]∥2010 2nd International Conference on Computer Engineering and Technology.2010
[6] 王琦,唐世渭,杨冬青,等.基于DOM的网页主题信息自动提取[J].计算机研究与发展,2004,41(10):1756-1792
[7] Yamaguchi Y,Amagasa T,Kitagawa H.Tag-based User Topic Discovery using Twitter Lists[C]∥2011 International Confe-rence on Advances in Social Networks Analysis and Mining.2011:13-20
[8] Cheng L.Unsupervised topic discovery by anomaly detection[D].Monterey,California:Naval Postgraduate School,2013
[9] Pedersen T,Banerjee S,Patwardhan S.Maximizing semantic relatedness to perform word sense disambiguation[J/OL].http://www.patwardhans.net/papers/pedersenBP05.pdf
[10] Naskar S K,Bandyopadhyay S.Word sense disambiguation using extended wordnet[C]∥Proceedings of the International Confe-rence on Computing:Theory and Applications(ICCTA’07).2007:446-450
[11] Naskar S K,Bandyopadhyay S.JU-SKNSB:extended WordNetbased WSD on the English all-words task at SemEval-1[C]∥Proceedings of the 4th International Workshop on Semantic Evaluations.Association for Computational Linguistics.2007:203-206
[12] Shen Wan,Angryk R A.Measuring semantic similarity usingwordnet-based context vectors[C]∥IEEE International Confe-rence on Systems,Man and Cybernetics,2007(ISIC).2007:908-913

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!