Computer Science ›› 2022, Vol. 49 ›› Issue (9): 92-100.doi: 10.11896/jsjkx.210700241

• Database & Big Data & Data Science • Previous Articles     Next Articles

Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network

LYU Xiao-feng1,2,3, ZHAO Shu-liang1,2,3, GAO Heng-da4, WU Yong-liang5, ZHANG Bao-qi1,2,3   

  1. 1 College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China
    2 Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics & Data Security,Hebei Normal University,Shijiazhuang050024,China
    3 Hebei Provincial Key Laboratory of Network & Information Security,Hebei Normal University,Shijiazhuang 050024,China
    4 Software College,Hebei Normal University,Shijiazhuang 050024,China
    5 School of Information Science and Technology,Shijiazhuang Tiedao University,Shijiazhuang 050043,China
  • Received:2021-07-26 Revised:2021-10-17 Online:2022-09-15 Published:2022-09-09
  • About author:LYU Xiao-feng,born in 1996,postgra-duate.His main research interests include machine learning and intelligent information processing.
    ZHAO Shu-liang,born in 1967,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include machine learning and intelligent information processing.
  • Supported by:
    National Social Science Fund of China(13&ZD091,18ZDA200),Hebei Provincial Key Research and Development Project of China(20370301D) and Key Technology Development Project of Hebei Normal University(L2020K01).

Abstract: With the deep integration of computer technology into social life,more and more short text messages are spreaded all over the web platform.Aiming at the problem of data sparsity of short texts,a robust heterogeneous information network framework(HTE) for modeling short texts,which can integrate any type of additional information and capture the relationship between them to solve the data sparsity problem,is constructed.Based on this framework,six short text expansion methods are designed using different external knowledge,and the short text features are enriched by introducing entity information such as entities,entity categories,inter-entity relationships and textual information such as text topics from Wikipedia and Freebase knowledge bases.Finally,the similarity measurement result is used to verify the experimental effect.By comparing the six text expansion me-thods with the traditional three similarity measures on two short text datasets and the current mainstream short text matching algorithms,the results of the proposed six text expansion methods are improved.Compared with BERT,the similarity measurement results of the best method improves by 5.97%.The proposed framework is robust and can include any type of external know-ledge,and the proposed method can overcome the data sparsity problem of short texts and can perform similarity metrics on short texts with high accuracy in an unsupervised manner.

Key words: Heterogeneous information network, Short text enrichment method, Short text matching, Knowledge base, Meta-path

CLC Number: 

  • TP391
[1]LI C L,CHEN S Q,XING J,et al.Seed-guided topic model for document filtering and classification[J].ACM Transactions on Information Systems(TOIS),2018,37(1):1-37.
[2]NIE L Q,LI Y Q,FENG F L,et al.Large-scale question taggingvia joint question-topic embedding learning[J].ACM Transactions on Information Systems(TOIS),2020,38(2):1-23.
[3]WANG X,CHEN R,JIA Y,et al.Short text classification using wikipedia concept based document representation[C]//2013 International Conference on Information Technology and Applications.IEEE,2013:471-474.
[4]MOURINO-GARCIA M A,PEREZ-RODRIGUEZ R,ANIDO-RIFON L,et al.Wikipedia-based hybrid document representation for textual news classification[J].Soft Computing,2018,22(18):6047-6065.
[5]BOLLACKER K D,EVANS C,PARITOSH P,et al.Freebase:a collaboratively created graph database for structuring human knowledge[C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data.2008:1247-1250.
[6]AUER S,BIZER C,KOBILAROV G,et al.Dbpedia:A nucleus for a web of open data[M].Heidelberg:Springer,2007:722-735.
[7]SHI C,PHILIP S Y.Heterogeneous information network analysis and applications[M].Cham:Springer International Publi-shing,2017.
[8]SHI C,SUN Y,PHILIP S Y.Research status and future deve-lopment of heterogeneous information networks[J].Communications of the Computer Society,2017,11(13):35-40.
[9]YAO D,BI J,HUANG J,et al.A word distributed representation based framework for large-scale short text classification[C]//2015 International Joint Conference on Neural Networks(IJCNN).IEEE,2015:1-7.
[10]FLISAR J,PODGORELEC V.Improving short text classification using information from DBpedia ontology[J].Fundamenta Informaticae,2020,172(3):261-297.
[11]GHADERY E,MOVAHEDI S,FAILI H,et al.A multilingual ngram-based convolutional network for aspect category detection in online reviews[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:6441-6448.
[12]WANG C,SONG Y,LI H,et al.Text classifification with hete-rogeneous information network kernels[C]//Proceedings of AAAI.2016:2130-2136.
[13]YANG T C,HU L M,SHI C,et al.HGAT:HeterogeneousGraph Attention Networks for Semi-supervised Short Text Classification [J].ACM Transactions on Information Systems,2021,39(3):1-29.
[14]HU B,LU Z,LI H,et al.Convolutional neural network architectures for matching natural language sentences[C]//Proceedings of the Conference on Neural Information Processing Systems.2014:2042-2050.
[15]PANG L,LAN Y,GUO J,et al.Text matching as image recognition[C]//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.2016:2793-2799.
[16]SUN Y,YU Y,HAN J.Ranking-based clustering of heteroge-neous information networks with star network schema[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2009:797-806.
[17]WANG C G,SONG Y Q,LI H R,et al.KnowSim:A Document Similarity Measure on Structured Heterogeneous Information Networks[C]//Proceedings of the 2015 IEEE International Conference on Data Mining(ICDM).2015:1015-1020.
[18]LI J,ZHANG X,ZHOU X.ALBERT-Based Self-EnsembleModel With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation:Algorithm Validation Study[J].JMIR Medical Informatics,2021,9(1):e23086.
[19]CER D,DIAB M,AGIRRE E,et al.Semeval-2017 task 1:Semantic textual similarity-multilingual and cross-lingual focused evaluation[J].arXiv:1708.00055,2017.
[20]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[21]CONNEAU A,KIELA D,SCHWENK H,et al.Supervisedlearning of universal sentence representations from natural language inference data[J].arXiv:1705.02364,2017.
[22]KIROS R,ZHU Y,SALAKHUTDINOV R R,et al.Skip-thought vectors[C]//Advances in Neural Information Proces-sing Systems.2015:3294-3302.
[1] HUANG Li, ZHU Yan, LI Chun-ping. Author’s Academic Behavior Prediction Based on Heterogeneous Network Representation Learning [J]. Computer Science, 2022, 49(9): 76-82.
[2] DU Hang-yuan, LI Duo, WANG Wen-jian. Method for Abnormal Users Detection Oriented to E-commerce Network [J]. Computer Science, 2022, 49(7): 170-178.
[3] JIANG Zong-li, FAN Ke, ZHANG Jin-li. Generative Adversarial Network and Meta-path Based Heterogeneous Network Representation Learning [J]. Computer Science, 2022, 49(1): 133-139.
[4] ZHENG Su-su, GUAN Dong-hai, YUAN Wei-wei. Heterogeneous Information Network Embedding with Incomplete Multi-view Fusion [J]. Computer Science, 2021, 48(9): 68-76.
[5] ZHAO Jin-long, ZHAO Zhong-ying. Recommendation Algorithm Based on Heterogeneous Information Network Embedding and Attention Neural Network [J]. Computer Science, 2021, 48(8): 72-79.
[6] LIU Xiao-long, HAN Fang, WANG Zhi-jie. Joint Question Answering Model Based on Knowledge Representation [J]. Computer Science, 2021, 48(6): 241-245.
[7] GAO Chuang, LI Jian-hua, JI Xiu-yi, ZHU Cheng-long, LI Shi-liang, LI Hong-lin. Drug Target Interaction Prediction Method Based on Graph Convolutional Neural Network [J]. Computer Science, 2021, 48(10): 127-134.
[8] LIU Jun-liang, LI Xiao-guang. Techniques for Recommendation System:A Survey [J]. Computer Science, 2020, 47(7): 47-55.
[9] JIANG Zong-li, LI Miao-miao, ZHANG Jin-li. Graph Convolution of Fusion Meta-path Based Heterogeneous Network Representation Learning [J]. Computer Science, 2020, 47(7): 231-235.
[10] WANG Xu, PANG Wei, WANG Zhe. MetaStruct-CF:A Meta Structure Based Collaborative Filtering Algorithm in Heterogeneous Information Networks [J]. Computer Science, 2019, 46(6A): 397-401.
[11] LI Zhi-xing, REN Shi-ya, WANG Hua-ming, SHEN Ke. Knowledge Reasoning Method Based on Unstructured Text-enhanced Association Rules [J]. Computer Science, 2019, 46(11): 209-215.
[12] LUO Da, SU Jin-dian, LI Peng-fei. Multi-view Attentional Approach to Single-fact Knowledge-based Question Answering [J]. Computer Science, 2019, 46(10): 215-221.
[13] HAN Zhao, MIAO Duo-qian, REN Fu-ji. Rough Set Based Knowledge Predicate Analysis of Chinese Knowledge Based Question Answering [J]. Computer Science, 2018, 45(6): 183-186.
[14] YU Ya-xin and ZHANG Hai-jun. Activity Recommendation Algorithm Based on Latent Friendships in EBSN [J]. Computer Science, 2018, 45(3): 196-203.
[15] CHEN Xin-lei, JIA Yan-tao, WANG Yuan-zhuo, JIN Xiao-long and CHENG Xue-qi. Multi-dimensional Quantitative Evaluation Method of Open Knowledge Base Construction Technology [J]. Computer Science, 2017, 44(12): 17-22.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!