基于Relative-IDF的医药数据相似度算法研究

计算机科学 ›› 2014, Vol. 41 ›› Issue (Z6): 417-420.

基于Relative-IDF的医药数据相似度算法研究

向林泓,张炬,孙启龙,赵学良

中国科学院重庆绿色智能技术研究院重庆404100;中国科学院重庆绿色智能技术研究院重庆404100;中国科学院重庆绿色智能技术研究院重庆404100;中国科学院重庆绿色智能技术研究院重庆404100

出版日期:2018-11-14 发布日期:2018-11-14
基金资助:
本文受国家科技支撑计划课题:药品在线交易服务技术与平台研发及示范应用(2012BAH19F01)资助

Medical Data Similarity Algorithm Analysis Based on Relative-IDF

XIANG Lin-hong,ZHANG Ju,SUN Qi-long and ZHAO Xue-ling

Online:2018-11-14 Published:2018-11-14

摘要/Abstract

摘要： 医药数据相似度计算在药物信息处理中具有重要的作用。传统的文本相似度计算在医药领域并不能取得很好的效果。针对医药数据文本的特殊性,提出基于Relative-IDF的医药数据相似度计算算法。实验结果表明:相比传统TF-IDF、编辑距离等计算方法,基于Relative-IDF的医药数据相似度计算在效率和准确性上都有了很大的提升。

关键词: 医药数据相似度,编辑距离,Relative-IDF,TF-IDF 中图法分类号TP311.1文献标识码A

Abstract: Medical data similarity calculation plays an important role in drug information treatment．Traditional text similarity measurement in the field of medicine and can’t get good results．Particularity for the pharmaceutical data text proposed based on Relative-IDF similarity calculation algorithm of medical data．Experimental results show that compared to traditional TF-IDF,edit distance calculation method,based on Relative-IDF medical data similarity measurement in efficiency and accuracy has been greatly improved.

Key words: Medical data similarity,Edit distance,Relative-IDF,TF-IDF

向林泓,张炬,孙启龙,赵学良. 基于Relative-IDF的医药数据相似度算法研究[J]. 计算机科学, 2014, 41(Z6): 417-420. https://doi.org/

XIANG Lin-hong,ZHANG Ju,SUN Qi-long and ZHAO Xue-ling. Medical Data Similarity Algorithm Analysis Based on Relative-IDF[J]. Computer Science, 2014, 41(Z6): 417-420. https://doi.org/

参考文献

[1] Fung B C M,Wang K,Ester M．Hierarchicaldocument clustering Wang John ed[C]∥The Encyclopedia of Data Ware housing and Mining．IdeaGroup-2005:970-975
[2] Hall P,Dowling G．Approximatestring matching[J].Computing Survey,1980,12(4):381-402
[3] Coelho T,Calado P,Souza L,et al.Image retrieval using multiple evidence ranking[J]．IEEE Transactions on Knowledge and Data Engineering,2009,16(4):408-417
[4] Theobald M,Siddharth J,Paepcke A．SpotSigs:Robust and efficient near duplicate detection in large Web collections[C]∥Proceedings of the 31st Annual International ACMSIGIR Conference on Research and Development in Information Retrieval．Singapore,2008:563-570
[5] Erkan G,Radev D．Lexrank:Graphbased lexical centrality as saliencein text summarization[J]．Journal of Artificial Intelligence Research,2009,22(7):457-479
[6] Ko Y,Park J,Seo J．Improving text categorizationusingthe importance of sentences[J]．Information Processing and Management,2010,40(1):6579
[7] 中医药学语言系统Wi-ki[EB/OL]．http://www.cintcm.com/yuyan/index.htm,2013-05-01
[8] 医药在线交易服务平台Wi-ki[EB/OL].http://www.yao1.cn/,2013-07-01
[9] VSM Wi-ki[EB/OL]．http://en.wikipedia.org/wiki/VSM,2012-03-19
[10] Oleshchuk V．Ontology based semantic similarity comparison of documents [C]∥ 14th International Workshop on Database and Expert Systems Applications,2003．2003,1
[11] Ding C H Q．Research on Optimize Technology in Latent Semantic Indexing Based on Semantic Block[C]∥ Chinese Conference on Pattern Recognition,2009(CCPR 2009).2009
[12] 刘群,李素建.基于《知网》的词汇语义相似度计算[C]∥第三届汉语词汇语义学研讨会,2002
[13] 晋耀红.基于语境框架的文本相似度计算[J].计算机工程与应用,2004(16)
[14] 颜端武,成晓,甘利人.基于领域本体和概念向量的中文文本相似性测度研究[J].中国图书馆学报,2007,33(6)
[15] 穗志方,俞士汶.基于骨架依存树的语句相似度计算模型[C]∥中文信息处理国际会议．1998
[16] 黄慧,印鉴,侯昉．一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J]．计算机学报,2011(5):856-864
[17] TF-IDF Wi-ki[EB/OL]．http://zh.wikipedia.org/wiki/TF-IDF,2013-05-01

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed