计算机科学 ›› 2015, Vol. 42 ›› Issue (Z11): 7-9.

• 智能计算 • 上一篇    下一篇

网络食品安全的歧义性消解算法

刘金硕,邓莹莹,邓娟   

  1. 武汉大学计算机学院 武汉430072,武汉大学国际软件学院 武汉430072,武汉大学国际软件学院 武汉430072
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金项目(61303214)资助

Disambiguation Algorithm Design and Implementation of Food Safety Issues in Network

LIU Jin-shuo, DENG Ying-ying and DENG Juan   

  • Online:2018-11-14 Published:2018-11-14

摘要: 以网络食品安全信息为研究对象,旨在提出一个能够解决食品安全领域专有名词指代不明的歧义消解算法。文中采用的歧义消解算法是在改进的TF-IDF特征选择算法的基础上,结合了隐含马尔可夫模型(HMM)和SVM分类器,从而实现专有名词的歧义消解。提出了一个在TF-IDF的基础上增加两个加权因子的特征提取算法LN-TF-IDF。实验表明,以202831条文本实验所得的准确率和召回率的调和平均值F1值为评价标准,设计的基于改进TF-IDF的食品安全领域歧义消解算法的效果比基于传统TF-IDF的歧义消解算法平均提升了7.31%,且在不同时间抓取的实验数据集下,本算法的效果也相对稳定。

关键词: 食品安全,歧义消解,隐含马尔可夫模型,TF-IDF,支持向量机

Abstract: The article aimed to put forward a disambiguation algorithm which can correctly classify the unknown terms,based on the food safety information in network.The disambiguation algorithms used in this paper combines the hidden Markov model(HMM) and SVM classifier to achieve terminology disambiguation,based on the improved TF-IDF feature selection algorithm.This paper proposed a new feature extraction algorithm LN-TF-IDF with two additional weighting factors on traditional TF-IDF.Experiments show that,the improved TF-IDF disambiguation algorithm designed in the field of food safety enhances the effect of disambiguation by average 7.31% on the 202831 texts.It was compared with the traditional TF-IDF text feature selection algorithm,with the F-measure as evaluation criteria.At the same time,the effect of the algorithm is relatively stable on different experimental data sets obtained from different time.

Key words: Food safety,Disambiguation,HMM,TF-IDF,SVM

[1] 龚凌晖.中文命名实体识别与歧义消解研究[D].上海:复旦大学,2011
[2] 何径舟,王厚峰.基于特征选择和最大熵模型的汉语词义消歧[J].软件学报,2010(6):1287-1295
[3] Pedersen T.A Decision Tree of Bigrams is an Accurate Predictor of Word Sense [C]∥Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics(NAACL-01).Pittsburgh,PA,2001
[4] Hoffart J,Yosef M A,Bordino H,et al.Robust Disambiguation of Named Entities in Text[C]∥Proceedings of the 2011 Con-ference on Empirical Methods in Natural Language Processing.Edinburgh,Scotland,UK,2011:782-792
[5] 戴祥鹰.文本聚类在话题检测与人名消歧中的应用研究[D].哈尔滨:哈尔滨工业大学,2010
[6] 韩伟.人名消歧研究与实现[D].北京:北京大学,2014
[7] 李永亮,黄曙光,鲍蕾.一种基于PageRank算法和知网的词义消歧方法[J].计算机应用与软件,2011,8(4):213-215
[8] 徐钟.隐含马尔科夫模型在中文实体分类中的应用及研究[D].南昌:南昌大学,2012
[9] Mena B H,van K M.A Hybrid Approach for Robust Multilingual Toponym Extraction and Disambiguation [C]∥International Conference on Language Processing and Intelligent Information Systems.Warsaw,Poland,2013
[10] 廖浩,李志蜀,王秋野,等.基于词语关联的文本特征词提取方法[J].计算机应用,2007,27(12):3009-3012
[11] 平源.基于支持向量机的聚类及文本分类研究[D].北京:北京邮电大学,2012
[12] 范昕炜.支持向量机算法的研究及其应用[D].杭州:浙江大学,2003

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!