Computer Science ›› 2012, Vol. 39 ›› Issue (3): 200-205.

Previous Articles     Next Articles

Improvement of Web Information Extraction Based on Genetic Algorithm and Hidden Markov Model

LI Rongl,HU Zhi-jun,ZHENG Jia-heng   

  • Online:2018-11-16 Published:2018-11-16

Abstract: In order to further enhance the accuracy and efficiency of Web information extraction,for the shortcomings of hybrid method of genetic algorithm and first order hidden Markov model in the initial value selection and parameter op- timization, an improved combined method embedded with genetic algorithm and second-order hidden Markov model was presented. In the hierarchical preprocessing phase, text was segmented hierarchically into proper lines, blocks and words by using the format information and text features. And then the embedded genetic algorithm and second-order hidden Markov hybrid model were adopted to train parameters, and the optimal and sulroptimal chromosomes were all retained to modify initial parameters of I3aum-Welch algorithm and genetic algorithm was used repeatedly to fine-tune the se- cond-order hidden Markov model. Finally the improved Viterbi algorithm was used to extract Web information. Experi- mental results show that the new method improves the performance in precision, recall and time.

Key words: Information extraction, Genetic algorithm, Second-order hidden markov model, Hierarchy

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!