计算机科学 ›› 2015, Vol. 42 ›› Issue (5): 62-66.doi: 10.11896/j.issn.1002-137X.2015.05.013
刘琼琼,左万利,王 英
LIU Qiong-qiong, ZUO Wan-li and WANG Ying
摘要: 网页主题挖掘对自然语言处理如网页文本分类、文摘自动生成、信息融合等具有重要意义。挖掘网页主题可以帮助用户更好地理解网页内容。尽管已有一些从普通文本中挖掘概念的工作,但其很少考虑单词所属标签和位置对单词权重的影响,且没有工作给出上述两种影响因子的计算方法。借助WordNet,将网页主题从词语扩展到概念层次,提出了使用词性标注和词义消歧确定网页中单词词义并充分利用标签影响因子和位置影响因子对网页正文文本特征进行权重修正的主题概念挖掘方法,给出了两种影响因子的计算公式。在DMOZ数据集上的实验结果表明,修正权重可以明显提高主题挖掘精度,最高可达到0.95。
[1] Jayabharathy J,Kanmani S,Parveen A A.Document Clustering and Topic Discovery based on Semantic Similarity in Scientific Literature[C]∥2011 IEEE 3rd International Conference on Communication Software and Networks (ICCSN).2011:425-429 [2] Uluhan E,Badur B.Development of a Framework for Sub-Topic Discovery from the Web[C]∥PICMET 2008 Proceedings.July 2008:878-888 [3] Shi Jing,Li Wan-long.Topic Discovery Based on LDA Modelwith Fast Gibbs Samping[C]∥2009 International Conference on Artificial Intelligence and Computational Intelligence.2009:91-95 [4] Ding W,Rohban M H,Ishwar P,et al.Topic Discovery through Data Dependent and Random Projections[C]∥International Conference on Machine Learning (ICML’13).2013:471-479 [5] Yang Yun,Wu Ya-nan.Content-based topic discovery of high-impact model[C]∥2010 2nd International Conference on Computer Engineering and Technology.2010 [6] 王琦,唐世渭,杨冬青,等.基于DOM的网页主题信息自动提取[J].计算机研究与发展,2004,41(10):1756-1792 [7] Yamaguchi Y,Amagasa T,Kitagawa H.Tag-based User Topic Discovery using Twitter Lists[C]∥2011 International Confe-rence on Advances in Social Networks Analysis and Mining.2011:13-20 [8] Cheng L.Unsupervised topic discovery by anomaly detection[D].Monterey,California:Naval Postgraduate School,2013 [9] Pedersen T,Banerjee S,Patwardhan S.Maximizing semantic relatedness to perform word sense disambiguation[J/OL].http://www.patwardhans.net/papers/pedersenBP05.pdf [10] Naskar S K,Bandyopadhyay S.Word sense disambiguation using extended wordnet[C]∥Proceedings of the International Confe-rence on Computing:Theory and Applications(ICCTA’07).2007:446-450 [11] Naskar S K,Bandyopadhyay S.JU-SKNSB:extended WordNetbased WSD on the English all-words task at SemEval-1[C]∥Proceedings of the 4th International Workshop on Semantic Evaluations.Association for Computational Linguistics.2007:203-206 [12] Shen Wan,Angryk R A.Measuring semantic similarity usingwordnet-based context vectors[C]∥IEEE International Confe-rence on Systems,Man and Cybernetics,2007(ISIC).2007:908-913 |
No related articles found! |
|