Computer Science ›› 2015, Vol. 42 ›› Issue (1): 261-267.doi: 10.11896/j.issn.1002-137X.2015.01.058

Previous Articles     Next Articles

Measuring Semantic Similarity between Words Using Web Search Engines

CHEN Hai-yan   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Semantic similarity measures play important roles in many Web-related tasks such as Web browsing and querysuggestion.Because taxonomy-based methods cannot deal with continually emerging words,recently Web-based methods have been proposed to solve this problem.Because of the noise and redundancy hidden in the Web data,robustness and accuracy are still challenges.We proposed a method integrating page counts and snippets returned by Web search engines.Then,the semantic snippets and the number of search results were used to remove noise and redundancy in the Web snippets.After that,a method integrating page counts,semantics snippets and the number of already displayed search results was proposed.The proposed method does not need any human annotated knowledge,and can be applied Web-related tasks easily.A correlation coefficient of 0.851 against Rubenstein-Goodenough benchmark dataset shows that the proposed method outperforms the existing Web-based methods by a wide margin.Moreover,the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods.

Key words: Semantic similarity,Information retrieval,Query suggestion,Web search

[1] Resnik P.Semantic similarity in a taxonomy:an informationbased measure and its application to problems of ambiguity in natural language[J].Journal of Artificial Intelligence Research 1999,11:95-130
[2] Luo X,Hu Q,Xu W,et al.Discovery of textual knowledge flow based on the management of knowledge maps[J].Concurrency and Computation:Practice and Experience,2008,20:1791-1806
[3] Luo X,Xu Z,Li Q,et al.Generation of similarity knowledgeflow for intelligent browsing based on semantic link networks[J].Concurrency and Computation:Practice and Experience 2009,21:2018-2032
[4] Luo X,Yu J,Li Q,et al.Building web knowledge flows based on interactive computing with semantics[J].New Generation Computing,2010,28:113-120
[5] Zhang S,Luo X,Chen J,et al.Measuring knowledge delivery quantity of associated knowledge flow[C]∥Proceedings of the Fourth International Conference on Semantics,Knowledge and Grid.IEEE Computer Society:Washington,DC,2008:117-124
[6] Smeulders A,Worring M,Santini S,et al.Content-based image retrieval at the end of the early years[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(12):1349-1380
[7] Srihari R,Zhang Z,Rao A.Intelligent indexing and semantic retrieval of multimodal documents[J].Information Retrieval,2000,2:245-275
[8] Makkonen J,Ahonen-Myka H,Salmenkivi M.Simple semantics in topic detection and tracking[J].Information Retrieval,2004,7:347-368
[9] Green S J.Building hypertext links by computing semantic similarity[J].IEEE Transactions on Knowledge and Data Enginee-ring,1999,11(5):713-730
[10] Vojnovic M,Cruise J,Gunawardena D,et al.Ranking and suggesting popular items[J].IEEE Transactions on Knowledge and Data Engineering,2009,21(8):1133-1146
[11] Cimano P,Handschuh S.Towards the self-annotating web[C]∥Proceedings of the 13th International World Wide Web Confe-rence.ACM Press:New York,2004:462-471
[12] Schenkel R,Theobald A,Weikum G.Semantic similarity search on semistructured data with the XXL search engine[J].Information Retrieval,2005,8:521-545
[13] Resnik P,Smithm A.The Web as a parallel corpus[J].Computational Linguistics 2003,29(3):349-380
[14] Xiao C,Wang W,Lin X,et al.Efficient similarity joins for near duplicate detection[C]∥Proceedings of 17th International World Wide Web Conference.ACM Press:New York,NY,2008:131-140
[15] Richardson R,Smeaton F.Using WordNet in a knowledge-based approach to information retrieval[D].Working Paper,CA-0395,School of Computer Applications,Dublin City University,Ireland,1999
[16] Sussna M.Word sense disambiguation for free-text indexingusing a massive semantic network[C]∥Proceedings of the Se-cond International Conference on Information and Knowledge Management.ACM Press:New York,NY,1993:67-74
[17] Jiang J J,Conrath D W.Semantic similarity based on corpus statistics and lexical taxonomy[C]∥Proceedings of International Conference Research on Computational Linguistics.1997
[18] Herdagdelen A,Erk K.Measuring semantic relatedness withvector space models and random walks[C]∥Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing.2009:50-53
[19] Li Y,Bandar A,McLean D.An approach for measuring semantic similarity between words using multiple information sources[J].IEEE Transaction on Knowledge and Data Engineering,2003,15(4):871-882
[20] Turney P D.Features of similarity[J].Psychological Review,1997,84(4):327-352
[21] Chen H,Lin M,Wei Y.Novel association measures using web search with double checking[C]∥Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics.2006:1009-1016
[22] Sahami M,Heilman D.A Web-based kernel function for measu-ring the similarity of short text snippets[C]∥Proceedings of the 15th International World Wide Web Conference.ACM Press:New York,NY,2006:377-386
[23] Islam A,Inkpen D.Second order co-occurrence PMI for determining the semantic similarity of words[C]∥Proceedings of the International Conference on Language Resources and Evaluation.2006:1033-1038
[24] Bollegala D,Matsuo Y,Ishizuka M.Measuring semantic similari-ty between words using web search engines[C]∥Proceedings of 16th International World Wide Web Conference.ACM Press:New York,NY,2007:757-766
[25] Firth R.A synopsis of linguistic theory 1930-1955[D].Studies in Linguistic Analysis,Philological Society:Oxford,1957
[26] Bayardo R J,Ma Y,Srikant R.Scaling up all pairs similaritysearch[C]∥Proceedings of 16th International World Wide Web Conference.ACM Press:New York,NY,2007:131-140
[27] Rubenstein H,Goodenough B.Contextual correlates of synonymy[J].Communications of the ACM,1965,8(10):627-633
[28] Agrawal R,Imielinski T,Swami A.Mining association rules between sets of items in large databases[C]∥Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data.Vol.22,ACM Press:New York,NY,1993:207-216
[29] Church W,Hanks P.Word association norms,mutual information and lexicography[C]∥Proceedings of the 27th Annual Conference of the Association of Computational Linguistics.1989:76-83

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!