Computer Science ›› 2018, Vol. 45 ›› Issue (1): 128-132.doi: 10.11896/j.issn.1002-137X.2018.01.021

Previous Articles     Next Articles

Study on Text Segmentation Based on Domain Ontology

LIU Yao, SHUAI Yuan-hua, GONG Xing-wei and HUANG Yi   

  • Online:2018-01-15 Published:2018-11-13

Abstract: Text segmentation plays an important role in information retrieval,abstract generation,question-answering system,information extraction and so on.This paper put forward a new text segmentation method based on domain ontology after analyzing and summarizing existing methods at home and abroad.The method first uses initial concept to automatically obtain structured semantic concepts set,which are then used to affix semantic labels to paragraphs in text based on the frequency of occurrence,position and relationship of concepts and properties.Paragraphs with the same semantic annotation information are grouped into one semantic paragraph,which helps discover the sub-topics information and meanwhile realize topic segmentation for texts.The experimental result shows that the precision,recall and F-mea-sure of this method can achieve 85%,90% and 88% respectively,which performs better than most existing methods and satisfies the real application needs.

Key words: Text segmentation,Domain ontology,Semantic annotation,Semantic paragraph

[1] CHOI F Y Y.Advances in domain independent linear text segmentation [C]∥NAACL 2000.2000:26-33.
[2] HALLIDAY,KIRWOOD M A,HASAN R.Cohesion in English [M].Routledge,2014.
[3] HEARST M A.TextTiling:segmenting text into multi-para-graph subtopic passages [M].MIT Press,1997.
[4] REYNAR J C.An automatic method of finding topic boundaries[C]∥Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics.1994:331-333.
[5] REYNAR,JEFFREY C.An Automatic Method of Finding To-pic Boundaries [J].Computer Science,1994,4(101):331-333.
[6] KERN R,GRANITZER M.Efficient linear text segmentationbased on information retrieval techniques[C]∥International Conference on Management of Emergent Digital Ecosystems.ACM,2009:25.
[7] WU J W,TSENG J C R,TSAI W N.An Efficient Linear TextSegmentation Algorithm Using Hierarchical Agglomerative Clustering[C]∥Seventh International Conference on Computational Intelligence and Security.IEEE Computer Society,2011:1081-1085.
[8] KAZANTSEVA A,SZPAKOWICZ S.Linear text segmentation using affinity propagation[C]∥Conference on Empirical Me-thods in Natural Language Processing.Association for Computational Linguistics,2011:284-293.
[9] BAYOMI M,LEVACHER K,GHORAB M R,et al.OntoSeg:A Novel Approach to Text Segmentation Using Ontological Similarity[C]∥IEEE International Conference on Data Mining Workshop.IEEE,2016:1274-1283.
[10] REYNAR J C.Statistical Models for Topic Segmentation[C]∥Proc.of Annual Meeting of the Association for Computational Linguistics,1999.1999:357-364.
[11] KAN M Y,KLAVANS J L,MCKEOWN K R.Linear Segmentation and Segment Significance[C]∥WVLC-6.1998:197-205.
[12] KAUCHAK D,CHEN F.Feature-based segmentation of narrative documents[C]∥ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing.Association for Computational Linguistics,2005:32-39.
[13] CHOI F Y Y,WIEMER-HASTINGS P,MOORE J.Latent Semantic Analysis for Text Segmentation[J].Proceedings of Emnlp,2001,4(3):109-117.
[14] BRANTS T,CHEN F,TSOCHANTARIDIS I.Topic-baseddocument segmentation with probabilistic latent semantic ana-lysis[C]∥Eleventh International Conference on Information and Knowledge Management.ACM,2002:211-218.〗
[15] MISRA H,JOSE J M,CAPPE O.Text segmentation via topic modeling:an analytical study[C]∥DBLP.2009:1553-1556.
[16] SUN Q,LI R,LUO D,et al.Text segmentation with LDA-based Fisher kernel[C]∥Proceedings of the,Meeting of the Association for Computational Linguistics on Human Language Tech-nologles:Short Papers.2008:269-272.
[17] RIEDL M,BIEMANN C.TopicTiling:a text segmentation algorithm based on LDA[C]∥Student Research Workshop.Asso-ciation for Computational Linguistics,2012:37-42.
[18] YU K,LI Z,GUAN G,et al.Unsupervised text segmentation using LDA and MCMC[C]∥Tenth Australasian Data Mining Conference.Australian Computer Society,Inc.2012:21-26.
[19] EISENSTEIN J,BARZILAY R.Bayesian unsupervised topicsegmentation[C]∥Conference on Empirical Methods in Natural Language Processing(EMNLP 2008).DBLP,2008:334-343.
[20] DU L,BUNTINE W,JOHNSON M.Topic Segmentation with a Structured Topic Model[C]∥Naacl-Hlt.2013:190-200.
[21] KERN R,GRANITZER M.Efficient linear text segmentationbased on information retrieval techniques[C]∥International Conference on Management of Emergent Digital Ecosystems.ACM,2009:25.
[22] CHANG P,MA H.Efficient short text subject extraction me-thod [J].Computer Engineering and Applications,2011,47(20):126-128.(in Chinese) 常鹏,马辉.高效的短文本主题词抽取方法[J].计算机工程与应用,2011,47(20):126-128.
[23] LIU Y,SUI Z F,HU Y W,et al.Domain Ontology automatic construction research [J].Journal of Beijing University of Posts and Telecommunications,2006,29(s2):65-69.(in Chinese) 刘耀,穗志方,胡永伟,等.领域Ontology自动构建研究[J].北京邮电大学学报,2006,29(s2):65-69.
[24] GONG X W,LIU Y.Research on Construction of Integrated Semantic Crawler [J].ICIC Express Letters,Part B:Applications,2016,7(7):1591-1598.
[25] CILIBRASI R L,VITANYI P M B.The Google Similarity Distance[J].IEEE Transactions on Knowledge & Data Enginee-ring,2004,19(3):370-383.
[26] LIU Y,SHI H Q,ZHENG D J.Study on semantic annotation for professional literature[J].ICIC Express Letters(Part B),2014,5(5):1383-1389.
[27] PEVZNER,HEARST,MARTI A.A critique and improvement of an evaluation metric for text segmentation[J].Computational Linguistics,2002,28(1):19-36.
[28] ZHU H J,ZHANG G P,CAI D F,et al.Application of Know-ledge Network in Text Segmentation Algorithm [C]∥International Conference on Information Processing.2007.(in Chinese) 朱海军,张桂平,蔡东风,等.知网在文本分割算法中的应用[C]∥中文信息处理国际会议.2007.
[29] ZHU J B,YE N,LUO H T.A text segmentation model based on multiple discriminant analysis [J].Journal of Software,2007,18(3):555-564.(in Chinese) 朱靖波,叶娜,罗海涛.基于多元判别分析的文本分割模型[J].软件学报,2007,18(3):555-564.
[30] ZHONG B B,LIU Y C,XU Z M.Study on Parameter Optimization in Text Sub-topic Segmentation Based on GA [J].Compu-ter Engineering and Applications,2005,41(21):97-99.(in Chinese) 钟彬彬,刘远超,徐志明.基于GA的文本子主题切分中的参数优化研究[J].计算机工程与应用,2005,41(21):97-99.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!