Computer Science ›› 2018, Vol. 45 ›› Issue (6A): 396-397.

• Big Date & Date Mining • Previous Articles     Next Articles

Construction Method of Domain Subject Thesaurus Based on Corpus

AN Ya-wei1,CAO Xiao-chun2,LUO Shun1   

  1. Shanghai General Recognition Technology Institute,Shanghai 201112,China1
    Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China2
  • Online:2018-06-20 Published:2018-08-03

Abstract: To achieve a massive domain corpus oriented subject thesaurus,a method based on feature matrix which is set up by computing words co-occurrence was proposed.By operating on this feature matrix,words are divided into clusters,and central word for each words cluster is calculated.Lexical bundles are finally gained by re-organizing words clusters using central word as a core.The experiment indicates that the proposed method can achieve good precision rate and recall rate.

Key words: Corpus mining, Subject thesaurus, Words cluster dividing, Words co-occurrence feature

CLC Number: 

  • TP391
[1]常春,卢文林.叙词表编制历史、现状与发展[J].农业图书情报学刊,2002(5):25-28.
[2]肖健,徐建,徐晓兰,等.英中可比语料库中多词表达自动提取与对齐[J].计算机工程与应用,2010,46(31):130-134.
[3]陈炯,张永奎.一种基于词聚类的文本特征描述方法[J].计算机系统应用,2011,20(2):211-215.
[4]葛宁,王军.领域Ontology的自动丰富——基于ADL地名表的实例研究[J].计算机科学,2007,34(9):156-162.
[5]奉国和,郑伟.国内中文自动分词技术研究综述[J].图书情报工作,2011,55(2):41-45.
[6]SALTON G,CLEMENT T Y.On the construction of effective vocabularies for information retrieval[C]∥Proc.of 1973 Mee-ting on Programming Languages and Information Retrieval.New York,USA:ACM Press,1973.
[7]丁国栋,白硕,王斌.一种基于局部共现的查询扩展方法[J].中文信息学报,2006,20(3):84-91.
[8]李勇,李苹.主题词表到领域本体的转化研究[J].现代计算机,2013(5):12-15.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] WANG Ming, WU Wen-fang, WANG Da-ling, FENG Shi, ZHANG Yi-fei. Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity [J]. Computer Science, 2022, 49(9): 33-40.
[3] ZHANG Jia, DONG Shou-bin. Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer [J]. Computer Science, 2022, 49(9): 41-47.
[4] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[5] SONG Jie, LIANG Mei-yu, XUE Zhe, DU Jun-ping, KOU Fei-fei. Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level [J]. Computer Science, 2022, 49(9): 64-69.
[6] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[7] ZHENG Wen-ping, LIU Mei-lin, YANG Gui. Community Detection Algorithm Based on Node Stability and Neighbor Similarity [J]. Computer Science, 2022, 49(9): 83-91.
[8] LYU Xiao-feng, ZHAO Shu-liang, GAO Heng-da, WU Yong-liang, ZHANG Bao-qi. Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network [J]. Computer Science, 2022, 49(9): 92-100.
[9] XU Tian-hui, GUO Qiang, ZHANG Cai-ming. Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance [J]. Computer Science, 2022, 49(9): 101-110.
[10] NIE Xiu-shan, PAN Jia-nan, TAN Zhi-fang, LIU Xin-fang, GUO Jie, YIN Yi-long. Overview of Natural Language Video Localization [J]. Computer Science, 2022, 49(9): 111-122.
[11] CAO Xiao-wen, LIANG Mei-yu, LU Kang-kang. Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model [J]. Computer Science, 2022, 49(9): 123-131.
[12] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[13] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[14] QU Qian-wen, CHE Xiao-ping, QU Chen-xin, LI Jin-ru. Study on Information Perception Based User Presence in Virtual Reality [J]. Computer Science, 2022, 49(9): 146-154.
[15] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!