计算机科学 ›› 2012, Vol. 39 ›› Issue (3): 200-205.

• 人工智能 • 上一篇    下一篇

基于遗传算法和隐马尔可夫模型的Web信息抽取的改进

李荣,胡志军,郑家恒   

  1. (忻州师范学院计算机系 忻州034000);(山西大学计算机与信息技术学院 太原030006)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Improvement of Web Information Extraction Based on Genetic Algorithm and Hidden Markov Model

LI Rongl,HU Zhi-jun,ZHENG Jia-heng   

  • Online:2018-11-16 Published:2018-11-16

摘要: 为了进一步提高Web信息抽取的准确性和效率,针对Web信息抽取的遗传算法和一阶隐马尔可夫模型混 合方法在初值选取和参数寻优上的不足,提出了一种遗传算法和二阶隐马尔可夫模型内嵌结合的改进方法。在分层 预处理阶段,利用格式信息和文本特征将文本切分成文本行、块或单个的词等恰当的层次;然后采用内嵌的遗传算法 和二阶隐马尔可夫混合模型训练参数,保留最优和次优染色体,修正13aum-Wclch算法的初始参数,多次使用遗传算 法微调二阶隐马尔可夫模型;最后用改进的Vitcrbi算法实现W cb信息抽取。实验结果表明,改进方法在精确度、召 回率指标和时间性能上均比遗传算法和一阶隐马尔可夫模型的混合方法具有更好的性能。

关键词: Web信息抽取,遗传算法,二阶隐马尔可夫模型,分层

Abstract: In order to further enhance the accuracy and efficiency of Web information extraction,for the shortcomings of hybrid method of genetic algorithm and first order hidden Markov model in the initial value selection and parameter op- timization, an improved combined method embedded with genetic algorithm and second-order hidden Markov model was presented. In the hierarchical preprocessing phase, text was segmented hierarchically into proper lines, blocks and words by using the format information and text features. And then the embedded genetic algorithm and second-order hidden Markov hybrid model were adopted to train parameters, and the optimal and sulroptimal chromosomes were all retained to modify initial parameters of I3aum-Welch algorithm and genetic algorithm was used repeatedly to fine-tune the se- cond-order hidden Markov model. Finally the improved Viterbi algorithm was used to extract Web information. Experi- mental results show that the new method improves the performance in precision, recall and time.

Key words: Information extraction, Genetic algorithm, Second-order hidden markov model, Hierarchy

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!