Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network

LYU Xiao-feng1,2,3, ZHAO Shu-liang1,2,3, GAO Heng-da4, WU Yong-liang5, ZHANG Bao-qi1,2,3   

  1. 1 College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China
    2 Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics & Data Security,Hebei Normal University,Shijiazhuang050024,China
    3 Hebei Provincial Key Laboratory of Network & Information Security,Hebei Normal University,Shijiazhuang 050024,China
    4 Software College,Hebei Normal University,Shijiazhuang 050024,China
    5 School of Information Science and Technology,Shijiazhuang Tiedao University,Shijiazhuang 050043,China
  • Received:2021-07-26 Revised:2021-10-17 Online:2022-09-15 Published:2022-09-09
  • Supported by:
    National Social Science Fund of China(13&ZD091,18ZDA200),Hebei Provincial Key Research and Development Project of China(20370301D) and Key Technology Development Project of Hebei Normal University(L2020K01).

Abstract: With the deep integration of computer technology into social life,more and more short text messages are spreaded all over the web platform.Aiming at the problem of data sparsity of short texts,a robust heterogeneous information network framework(HTE) for modeling short texts,which can integrate any type of additional information and capture the relationship between them to solve the data sparsity problem,is constructed.Based on this framework,six short text expansion methods are designed using different external knowledge,and the short text features are enriched by introducing entity information such as entities,entity categories,inter-entity relationships and textual information such as text topics from Wikipedia and Freebase knowledge bases.Finally,the similarity measurement result is used to verify the experimental effect.By comparing the six text expansion me-thods with the traditional three similarity measures on two short text datasets and the current mainstream short text matching algorithms,the results of the proposed six text expansion methods are improved.Compared with BERT,the similarity measurement results of the best method improves by 5.97%.The proposed framework is robust and can include any type of external know-ledge,and the proposed method can overcome the data sparsity problem of short texts and can perform similarity metrics on short texts with high accuracy in an unsupervised manner.

Key words: Heterogeneous information network, Short text enrichment method, Short text matching, Knowledge base, Meta-path

CLC Number: 

  • TP391
