计算机科学 ›› 2018, Vol. 45 ›› Issue (1): 128-132.doi: 10.11896/j.issn.1002-137X.2018.01.021
刘耀,帅远华,龚幸伟,黄毅
LIU Yao, SHUAI Yuan-hua, GONG Xing-wei and HUANG Yi
摘要: 文本分割在信息检索、摘要生成、问答系统、信息抽取等领域发挥着重要作用。在总结现有的国内外文本分割方法的基础上,提出了一种基于领域本体对文本进行线性分割的方法。该方法利用初始概念自动获取结构化语义概念集合,并根据获取的概念、属性及属性词在文本中出现的频次、位置和关系等因素为段落赋予语义标签,挖掘文本的子主题信息,将拥有相同语义标注信息的段落划分为相同语义段落,实现了文本不同子主题之间的分割。实验结果表明,该方法对于特定领域的文本分割的准确率、召回率以及F值分别达到了85%,90%和88%,分割效果能够满足实际应用需求,并优于现有的无需训练语料的文本分割方法。
[1] CHOI F Y Y.Advances in domain independent linear text segmentation [C]∥NAACL 2000.2000:26-33. [2] HALLIDAY,KIRWOOD M A,HASAN R.Cohesion in English [M].Routledge,2014. [3] HEARST M A.TextTiling:segmenting text into multi-para-graph subtopic passages [M].MIT Press,1997. [4] REYNAR J C.An automatic method of finding topic boundaries[C]∥Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics.1994:331-333. [5] REYNAR,JEFFREY C.An Automatic Method of Finding To-pic Boundaries [J].Computer Science,1994,4(101):331-333. [6] KERN R,GRANITZER M.Efficient linear text segmentationbased on information retrieval techniques[C]∥International Conference on Management of Emergent Digital Ecosystems.ACM,2009:25. [7] WU J W,TSENG J C R,TSAI W N.An Efficient Linear TextSegmentation Algorithm Using Hierarchical Agglomerative Clustering[C]∥Seventh International Conference on Computational Intelligence and Security.IEEE Computer Society,2011:1081-1085. [8] KAZANTSEVA A,SZPAKOWICZ S.Linear text segmentation using affinity propagation[C]∥Conference on Empirical Me-thods in Natural Language Processing.Association for Computational Linguistics,2011:284-293. [9] BAYOMI M,LEVACHER K,GHORAB M R,et al.OntoSeg:A Novel Approach to Text Segmentation Using Ontological Similarity[C]∥IEEE International Conference on Data Mining Workshop.IEEE,2016:1274-1283. [10] REYNAR J C.Statistical Models for Topic Segmentation[C]∥Proc.of Annual Meeting of the Association for Computational Linguistics,1999.1999:357-364. [11] KAN M Y,KLAVANS J L,MCKEOWN K R.Linear Segmentation and Segment Significance[C]∥WVLC-6.1998:197-205. [12] KAUCHAK D,CHEN F.Feature-based segmentation of narrative documents[C]∥ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing.Association for Computational Linguistics,2005:32-39. [13] CHOI F Y Y,WIEMER-HASTINGS P,MOORE J.Latent Semantic Analysis for Text Segmentation[J].Proceedings of Emnlp,2001,4(3):109-117. [14] BRANTS T,CHEN F,TSOCHANTARIDIS I.Topic-baseddocument segmentation with probabilistic latent semantic ana-lysis[C]∥Eleventh International Conference on Information and Knowledge Management.ACM,2002:211-218.〗 [15] MISRA H,JOSE J M,CAPPE O.Text segmentation via topic modeling:an analytical study[C]∥DBLP.2009:1553-1556. [16] SUN Q,LI R,LUO D,et al.Text segmentation with LDA-based Fisher kernel[C]∥Proceedings of the,Meeting of the Association for Computational Linguistics on Human Language Tech-nologles:Short Papers.2008:269-272. [17] RIEDL M,BIEMANN C.TopicTiling:a text segmentation algorithm based on LDA[C]∥Student Research Workshop.Asso-ciation for Computational Linguistics,2012:37-42. [18] YU K,LI Z,GUAN G,et al.Unsupervised text segmentation using LDA and MCMC[C]∥Tenth Australasian Data Mining Conference.Australian Computer Society,Inc.2012:21-26. [19] EISENSTEIN J,BARZILAY R.Bayesian unsupervised topicsegmentation[C]∥Conference on Empirical Methods in Natural Language Processing(EMNLP 2008).DBLP,2008:334-343. [20] DU L,BUNTINE W,JOHNSON M.Topic Segmentation with a Structured Topic Model[C]∥Naacl-Hlt.2013:190-200. [21] KERN R,GRANITZER M.Efficient linear text segmentationbased on information retrieval techniques[C]∥International Conference on Management of Emergent Digital Ecosystems.ACM,2009:25. [22] CHANG P,MA H.Efficient short text subject extraction me-thod [J].Computer Engineering and Applications,2011,47(20):126-128.(in Chinese) 常鹏,马辉.高效的短文本主题词抽取方法[J].计算机工程与应用,2011,47(20):126-128. [23] LIU Y,SUI Z F,HU Y W,et al.Domain Ontology automatic construction research [J].Journal of Beijing University of Posts and Telecommunications,2006,29(s2):65-69.(in Chinese) 刘耀,穗志方,胡永伟,等.领域Ontology自动构建研究[J].北京邮电大学学报,2006,29(s2):65-69. [24] GONG X W,LIU Y.Research on Construction of Integrated Semantic Crawler [J].ICIC Express Letters,Part B:Applications,2016,7(7):1591-1598. [25] CILIBRASI R L,VITANYI P M B.The Google Similarity Distance[J].IEEE Transactions on Knowledge & Data Enginee-ring,2004,19(3):370-383. [26] LIU Y,SHI H Q,ZHENG D J.Study on semantic annotation for professional literature[J].ICIC Express Letters(Part B),2014,5(5):1383-1389. [27] PEVZNER,HEARST,MARTI A.A critique and improvement of an evaluation metric for text segmentation[J].Computational Linguistics,2002,28(1):19-36. [28] ZHU H J,ZHANG G P,CAI D F,et al.Application of Know-ledge Network in Text Segmentation Algorithm [C]∥International Conference on Information Processing.2007.(in Chinese) 朱海军,张桂平,蔡东风,等.知网在文本分割算法中的应用[C]∥中文信息处理国际会议.2007. [29] ZHU J B,YE N,LUO H T.A text segmentation model based on multiple discriminant analysis [J].Journal of Software,2007,18(3):555-564.(in Chinese) 朱靖波,叶娜,罗海涛.基于多元判别分析的文本分割模型[J].软件学报,2007,18(3):555-564. [30] ZHONG B B,LIU Y C,XU Z M.Study on Parameter Optimization in Text Sub-topic Segmentation Based on GA [J].Compu-ter Engineering and Applications,2005,41(21):97-99.(in Chinese) 钟彬彬,刘远超,徐志明.基于GA的文本子主题切分中的参数优化研究[J].计算机工程与应用,2005,41(21):97-99. |
No related articles found! |
|