Computer Science ›› 2018, Vol. 45 ›› Issue (6A): 97-100.

• Intelligent Computing • Previous Articles     Next Articles

Word Segmentation Based on Adaptive Hidden Markov Model in Oilfield

GONG Fa-ming,ZHU Peng-hai   

  1. College of Computer & Communication Engineering,China University of Petroleum,Qingdao,Shandong 266580,China
  • Online:2018-06-20 Published:2018-08-03

Abstract: The Chinese word segmentation is the first step in constructing the petroleum field ontology.Documents in petroleum field have their own unique characteristics which make word segmentation more complex.Until now,there is no effective word segmentation algorithm,especially for Chinese characters.Based on the hidden Markovian model,an adaptive hidden Markovian word segmentation model was proposed in this paper,which combines the domain-knowledge dictionary and user-defined information,by introducing the terminology set.The proposed algorithm calibrates word segmentation under semantic constraints and word meaning constraints,and can identify professional terms and character combinations in the field of petroleum accurately.It is also proved that the proposed algorithm achieves remarkable improvements in both accuracy and recall rate in word segmentation,compared to the NLPIR Chinese word segmentation system invented by Chinese Academy of Science.

Key words: Chinese word segmentation, Combined character, Hidden Markov model, Petroleum

CLC Number: 

  • TP391
[1]来斯惟,徐立恒,陈玉博,等.基于表示学习的中文分词算法探索[J].中文信息学报,2013,27(5):8-14.
[2]JOHNSON E K,TYLER M D.Testing the limits of statistical learning for word segmentation[J].Developmental Science,2010,13(2):339-345.
[3]FU G,LUKE K K.A two-stage statistical word segmentation system for Chinese[C]∥Sighan Workshop on Chinese Language Processing.Association for Computational Linguistics,2003:156-159.
[4]WANG J.A Rule-based Methodology and Feature-based Methodology for Effect Relation Extraction in Chinese Unstructured Text[D].Dydney:University of Sydney,2015.
[5]SILVA D C,BRAGA D,RESENDE F G V J.A rule-based method for homograph disambiguation in brazilian portuguese text-to-speech systems[J].Journal of Communication and Information Systems,2015,27(1).
[6]AKEN J R V.A statistical learning algorithm for word segmen- tation[J/OL].Computer Science,https//arixv.org/ftp/arxiv/papers/1105/1105.6162.pdf.
[7]TOHTI T,MUSAJAN W,HAMDULLA A.Unsupervised Learn- ing and Linguistic Rule Based Algorithm for Uyghur Word Segmentation[J].Journal of Multimedia,2014,9(5):627-634.
[8]HONGBO POSTGRADUATE L I.Dictionary and Statistical Analysis Combined Algorithm for Chinese Word Segmentation[J].Journal of Wuhan University of Technology,2010(12):907-909.
[9]BHEGANAN P,NAYAK R,XU Y.Thai Word Segmentation with Hidden Markov Model and Decision Tree[C]∥Pacific-Asia Conference on Knowledge Discovery and Data Mining.Springer Berlin Heidelberg,2009:74-85.
[10]李月伦,常宝宝.基于最大间隔马尔可夫网模模型的汉语分词方法[J].中文信息学报,2010,24(1):8-14.
[11]PANG B,SHI H.Research on Improved Algorithm for Chinese Word Segmentation Based on Markov Chain[C]∥International Conference on Information Assurance and Security.IEEE,2009:236-238.
[12]OH-WOOK K.Korean Word Segmentation and Compound-noun Decomposition Using Markov Chain and Syllable N-gram[J].Journal of the Acoustical Society of Korea,2002,21(3):274-284.
[13]刁毓.基于本体的中文分词算法的研究与实现[D].曲阜:曲阜师范大学,2012.
[14]李良洁.基于统计和语义信息的中文分词算法研究[D].青岛:青岛科技大学,2015.
[1] LIU Kai, ZHANG Hong-jun, CHEN Fei-qiong. Name Entity Recognition for Military Based on Domain Adaptive Embedding [J]. Computer Science, 2022, 49(1): 292-297.
[2] ZHANG Jing, YANG Jian, SU Peng. Survey of Monosyllable Recognition in Speech Recognition [J]. Computer Science, 2020, 47(11A): 172-174.
[3] ZHANG Cheng-wei, LUO Feng-e, DAI Yi. Prediction Method of Flight Delay in Designated Flight Plan Based on Data Mining [J]. Computer Science, 2020, 47(11A): 464-470.
[4] JIA Zhi-chun, LI Xiang, YU Zhan-lin, LU Yuan, XING Xing. QoS Satisfaction Prediction of Cloud Service Based on Second Order Hidden Markov Model [J]. Computer Science, 2019, 46(9): 321-324.
[5] WU Jian-wei, LI Yan-ling, ZHANG Hui, ZANG Han-lin. HMM Cooperative Spectrum Prediction Algorithm Based on Density Clustering [J]. Computer Science, 2018, 45(9): 129-134.
[6] YUE Xin, DU Jun-wei, HU Qiang, WANG Yan-ping. Fault Tree Structure Matching Algorithm and Its Application [J]. Computer Science, 2018, 45(9): 202-206.
[7] TONG Zhen-ming, LIU Zhi-peng. Next Place Prediction of Massively Multiplayer Online Role-playing Games [J]. Computer Science, 2018, 45(11A): 453-457.
[8] YANG Lu, YU Shou-wen and YAN Jian-feng. Type-2 Fuzzy Logic Based Multi-threaded Data Race Detection [J]. Computer Science, 2017, 44(12): 135-143.
[9] LI Jin-ting, HOU Hong-xu, WU Jing, WANG Hong-bin and FAN Wen-ting. Effect of Preprocessing on Corpus of Mongolian-Chinese Statistical Machine Translation [J]. Computer Science, 2017, 44(10): 259-264.
[10] LI Wei-lin, WEN Jian and MA Wen-kai. Speech Recognition System Based on Deep Neural Network [J]. Computer Science, 2016, 43(Z11): 45-49.
[11] YANG Bei, ZHOU Lan-jiang, YU Zheng-tao and LIU Li-jia. Research on Semi-supervised Learning Based Approach for Lao Part of Speech Tagging [J]. Computer Science, 2016, 43(9): 103-106.
[12] ZHANG Xiang-gang, TANG Hai, FU Chang-jun and SHI Yu-liang. Gait Recognition Algorithm Based on Hidden Markov Model [J]. Computer Science, 2016, 43(7): 285-289.
[13] WANG Qing-song and WEI Ru-yu. Bayesian Chinese Spam Filtering Method Based on Phrases [J]. Computer Science, 2016, 43(4): 256-259.
[14] LIANG Xi-tao and GU Lei. Active Learning in Chinese Word Segmentation Based on Nearest Neighbor [J]. Computer Science, 2015, 42(6): 228-232.
[15] JIA Zhi-chun and XING Xing. Diagnosis Method of Behavior Inference in Web Service Composition [J]. Computer Science, 2015, 42(4): 60-64.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!